Exploring Alternative Datasets for Credit Scoring of Thin-File Consumers: A Comprehensive Review

Jun 8, 2025 |   By: Deepa Shukla |   Pages: 1 - 8 |     Open

SUMMARY

Credit scoring is a fundamental component of financial decision-making, enabling institutions to evaluate the creditworthiness of individuals and manage risk effectively. However, traditional credit scoring models, heavily reliant on historical credit data, often exclude thin-file consumers—individuals with little or no formal credit history—thereby limiting financial inclusion. This paper presents a comprehensive review of alternative datasets and machine learning (ML) techniques as innovative solutions to this challenge. Alternative datasets, such as social media activity, web browsing behaviours, digital footprints, telecom usage, and hybrid approaches, offer a broader perspective on consumer behaviours and financial reliability. When integrated with advanced ML algorithms, including neural networks, support vector machines, ensemble methods, and hybrid models, these datasets provide enhanced predictive capabilities, addressing data sparsity and capturing complex patterns in consumer behaviours. The findings underscore the potential of hybrid models that combine multiple datasets to achieve superior performance in credit risk assessment. This review also highlights critical challenges, such as data privacy, bias mitigation, and model interpretability, which remain significant barriers to the widespread adoption of alternative datasets and ML models. By synthesizing insights from over 75 studies spanning two decades (2000–2023), this research identifies key trends, evaluates the effectiveness of various approaches, and suggests actionable recommendations for future work. The implications of this review extend to financial institutions seeking to expand credit access to underserved populations, improve decision-making accuracy, and promote financial inclusion. Furthermore, it calls for the development of fairness-aware and transparent algorithms to ensure ethical and equitable credit scoring practices. Future research should focus on integrating emerging datasets, such as geolocation and behavioural analytics, and conducting longitudinal studies to validate the real-world impact of these advanced credit scoring methodologies.
DOI URL: