Detecting Financial Fraud

Challenge

Given the high transaction volume that financial institutions process daily, it is frequently impractical to train supervised fraud-detection models from scratch, as large human-labeled datasets are rare and expensive to generate. Unsupervised anomaly detection methods can instead be used to flag a small percentage of outlier transactions (typically < 1%) for automated action or human review.

Solution

We proposed that high volumes of withdrawals or large withdrawal-to-balance ratios over short time frames might signal fraud. Using an open source dataset of ~ 1 million real anonymized bank transactions, we applied nearest-neighbor and isolation methods for detecting outliers in those distributions. An isolation-based algorithm was best suited for classifying a small percentage of data points with high volume transactions, as measured by total value and frequency, as outliers, shown here:

When examining the weekly average withdrawal-to-balance ratio, we determined that a nearest-neighbors approach better captured large outliers in the ratio, especially when the account balance was negative due to overdrafts.

Outcome

In general, which specific method is preferable will depend on the distribution of the data, but these techniques allow real-time detection of potential fraud without the need for trained machine learning models from day one. If a human decision follows the anomaly detection, then the resulting labeled data can serve as a basis to train supervised fraud detection models. Unsupervised anomaly detection can also augment existing supervised fraud detection models that might fail in the detection of new types of fraud.

‍

Financial Services