Bank transactions automatic classification

We assisted Belvo in transitioning from a rule-based to a machine learning system for categorizing bank transactions for their customers across various countries and regions.


In the realm of transactional banking, data quality poses a significant challenge. The data are frequently incomplete or corrupted and lack a consistent syntactical structure, yet they play a crucial role in critical processes such as spend categorization, risk underwriting, and fraud detection. Traditional methods for handling this data, primarily rule-based and keyword matching categorization, are increasingly proving inadequate. These methods necessitate continual manual updates, a process that becomes increasingly cumbersome and unsustainable as data volumes scale. This situation highlights the need for a more efficient, scalable, and accurate approach to manage and utilize transactional banking data.


To effectively tackle the challenges presented by transactional banking data, we proposed the implementation of a Machine Learning (ML) model, enhanced by Natural Language Processing (NLP) techniques, specifically for the categorization of banking transactions. This advanced ML model is designed to 'learn' and 'understand' the myriad of possible transactional data variants. It is capable of adapting and updating its categorization rules autonomously, based on its accumulated experience and the extensive training it has undergone. This approach ensures that the model remains relevant and accurate even as new transaction patterns emerge, thereby providing a scalable, dynamic, and efficient solution for handling transactional banking data.


The implementation of the new Machine Learning (ML) model yielded transformative results in the management and processing of transactional banking data. One of the most significant outcomes was a fivefold increase in the speed of the API system's responses, greatly enhancing operational efficiency. Additionally, there was a notable improvement in accuracy, with a 20% increase compared to the previous system. This enhancement in accuracy not only bolstered the reliability of transaction categorization but also played a vital role in risk underwriting and fraud detection.

Moreover, the ML model brought about a dramatic reduction in the cost and effort associated with model maintenance. Unlike the previous rule-based systems that required constant manual updates, this self-learning model continuously evolves and adapts, becoming more precise and efficient with each new transaction it processes. This aspect of self-improvement ensures that the system remains up-to-date and effective without the need for frequent manual intervention, thus offering a sustainable and cost-effective solution for handling complex banking data.