Client Background

Client is a FinTech providing underserved population with limited credit history in East Africa and Columbia region access to micro loans ranging from 5 to 200 USD based on their needs.

 

Business Objective

Client is looking to expand presence into new markets using the Credit platform powered by ML models built on alternative data. These ML models will decide the first cut of customer eligibility based on risk score post which rule based business logic will decide the credit limit that can be tracked on real time basis. Customer eligibility will be function of score generated through model, while credit limit will be a function of recent transaction pattern.

 

Solution

Solution development journey was divided into following phases

  1. Problem discovery & understanding phase
    1. During this phase spanning 2 weeks, our team of data scientists & data engineers worked with client’s team to identify and explore relevant datasets for the study. These datasets included customer demographics, wallet transaction data, past loan history and repayment data. We further worked with the business to define a “risky” customer that needed to be modelled against. Based on past loan history and wallet transaction patterns, entire customer base was segmented into group of 4 and it was decided to model these separately due to different risk characteristics.
  2. Model Development & Training
    1. We created comprehensive list of features in collaboration with client’s team. These features were believed to have distinguishing capabilities between risk and non-risky borrower. Such features were based on transaction patterns across time intervals (past 2 weeks, 4 weeks etc), time of the day, consistency and volatility in transactions, percentage & index growth in transactions, past loan repayment behavior
    2. Used various feature reduction techniques, including IV & VIF, to reduce model features to set of 30 features.
    3. Based on decision boundary we decided to use tree-based algorithms and selected the best model (Xgboost) after comparing AUC, accuracy, rank – ordering and KS for other models Random Forest, Decision Tree, GBM and tuned the hyper parameters of the model. Out of time validation was done on 3 months of data post training window cut off.
    4. We built normalized scoring framework by fixing factor and offset for four models developed for different segments.
  3. Model Deployment and Post Implementation
    1. We created comprehensive list of features in collaboration with client’s team. These features were believed to have distinguishing capabilities between risk and non-risky borrower. Such features were based on transaction patterns across time intervals (past 2 weeks, 4 weeks etc), time of the day, consistency and volatility in transactions, percentage & index growth in transactions, past loan repayment behavior
    2. We implemented PSI and CSI to compare the output performance each week to with the developed model. Job is set up which sends an email with PSI and CSI as an excel attachment.
    3. We used ML Flow to deploy model trained in python to be used in Pyspark for scoring purpose and NIFI jobs for data pipeline

 

Outcome
  1. Trained models produced lift of 60-70% within first three deciles for 4 customer segments in out of time validation.
  2. Automation of credit risk scoring and limit assignment removed the subjectivity and improved TAT for loan application to within few minutes thereby improving overall customer experience.