Digital credit in tech companies: creation of credit risk scoring models

written by October Italia

Due to the uncertainty generated by the spread of Covid-19, in last 12 months the number of requests related to loans hugely grew and the main challenge for lenders in this situation was that of checking the real risk ratings for every borrower. An operation that was very difficult mostly because of the economic crisis linked to the Pandemics.

For this reason, today is more and more important to create risk scoring model those can answer rapidly to customers and adapt the models to the situation in that precise moment, helping the credit accessibility which became the most important need for the most of enterprises and SMEs.

Technology and digitalization are assets that have revolutionized companies’ business model. Alternative financing is the expression of this new framework: SMEs appreciate instant offers and best in class customer experience.

Fintech is fast-growing and, in a fintech company focused on digital credit such as October, the heart of technology is credit risk scoring model.

According to Tejas Shektar, October Head of Data, a credit risk scoring model is basically a set of rules used to quantify the risk involved in extending credit to a borrower. These rules and the data fed to them determine the nature, complexity and the performance of the model.

Rating models focus mainly on predicting credit-worthiness of the borrower. Whereas, scoring models can predict credit-worthiness and potential default. October Data Science Team focuses on building scoring models by means of predictive analytics.

FOCUS ON MACHINE LEARNING, ASSESSMENT, IMPLEMENTATION, VALIDATION.

To start building a model, you would need to have a clear vision of the query to be solved. Following the behind-the-scenes tour with PhD Shektar, October’s question was: how can we process loan applications in a fast, scalable and secured manner in order to help as many borrowers as possible while keeping our default risk low?

Here, we are dealing with a binary (default vs non-default) classification problem.

We need to start gathering the data from our data lake (a data store built in-house with enforced ACID – Atomicity, Consistency, Isolation e Durability – properties) which includes existing companies in our portfolio and their re-payment behaviour, all historical loan requests and their associated financials, bank transaction data and default flags.

This is typically followed by a data cleaning step, where we look at the distribution of all data points related to historical loan requests, to treat outliers and missing values. The purpose of this exercise is to understand our population and build a representative dataset on which we can train our model.

At October, we use both linear and non-linear models trained on this representative dataset. Non-linear models are often considered to behave like a black box, but we make use of SHAP (SHapley Additive exPlanations) to make non-linear models fully explainable.

THE LIFECYCLE OF A RISK SCORING MODEL

After the model is trained and deployed in production, October monitors data points (which are used for scoring) of the new loan requests over a period of time (usually 3-6 months).

If the statistical properties of these new data points have changed significantly as compared to the last model training, it is likely we will re-train the model and deploy an improved iteration of it in production. But this is not something to be done lightly: we need to understand what changed in the population and the biases that were introduced.

We are also on the lookout for new data points, either newly engineered from existing data or from suppliers, that could improve the performance of our model.
Through this deep and continuous work, October can therefore analyze data instantly, allowing for a safer and faster offer: this is the future of the commercial credit and how a FinTech like October can close the technological gap in the market.