Machine Learning Alerts: How to Increase Operational Efficiency with Predictive Scoring

June 9, 2022

An overwhelming majority of today’s Risk & Fraud teams suffer from a growing backlog of alerts. Unfortunately, what they have yet to uncover is that the majority of those alerts are false positives

With growing alert volumes, regulatory updates, and ever-changing markets, Risk & Fraud teams are busier than ever. However, being busy doesn’t necessarily equate to being productive. 

On the contrary, teams that burn through valuable resources chasing phantom fraudsters instead of addressing real threats fall victim to not meeting regulatory requirements in a timely manner, which is bad for business.

To help solve for this, Unit21 has created Alert Scores - to help focus investigator time on the alerts that matter. Here, we’ll cover how alert scoring works, the machine learning model we’ve deployed and why we chose it, and how alert scoring can make your risk and compliance program more effective.

Download ACH Fraud eBook

What are Alert Scores?

Unit21’s “Alert Scores” is a machine learning model that helps teams prioritize alerts based on the likelihood those alerts will yield a SAR (Suspicious Activity Report), requiring a case to be investigated

The alert score can then be used to triage alerts using the Unit21 queueing system, ensuring severe alerts are handled promptly and by the correct investigator.

Our machine learning model processes each alert generated by a customer’s rule in the Unit21 system to produce an Alert Score. This score, ranging from 0 to 100, ranks how likely the alert will result in a case requiring investigation. The score does not reflect a percentage.

The score is also represented visually using red and blue colors:

Alert Scores enable agents to investigate riskier alerts first. In addition, agents can automatically dismiss alerts with low scores or send them to newer agents for educational purposes.

Alert Scores improve the efficiency of Risk & Fraud teams by reducing investigation time and triage efforts.

What is Machine Learning?

Machine learning (ML) allows software applications to become more accurate at predicting outcomes without being explicitly programmed to do so. Instead, ML uses various algorithms that iteratively learn from data to improve, describe, and predict outcomes.

A machine learning model is the output generated when you train your machine learning algorithm with data. As the algorithms ingest more training data, it produces more precise models based on that data. 

After training, the predictive algorithm will create a predictive classifier model. You will be given an output when you provide the model with an input. Essentially, models are used to make a prediction or classification.  

In the case of Unit21, the Alert Score model is used to classify alerts by severity (based on your organization’s typical alert outcomes (likeliness of SAR filing and/or case investigation).

How Did We Build the ML Model?

Unit21’s machine learning algorithm is called a random forest classifier which is built in scikit-learn and coded in Python. 

While we initially considered other algorithms such as logistic regression, XGBoost, and recurrent neural networks (RNN), we chose random forests because they have been applied successfully in various industries. 

How Do Random Forest Algorithms Work?

Random forest algorithms are known for their fast training time and performance. They consist of many individual decision trees that operate as an ensemble. 

In a single decision tree, features of the data are split into nodes that try to separate the data into their correct classes. Each individual tree in a random forest has been generated on a different subset of features and will spit out its own class prediction. 

As individual trees in the forest may spit out different class predictions, the class with the most votes becomes the model’s prediction.

In the example below, the random forest model has been trained to find different types of fruits (apples, bananas, strawberries, pears, and pineapples). Here, it classifies that the input instance is an apple after majority voting occurs from the n decision trees: 

How We Train an Alert Score Model

To train an Alert Score model, we use data from your organization’s past alerts that have been known to produce a case or SAR. Because each organization is distinct, we generate a unique model for each customer.

Unit21 carefully gathers input data from past alerts and investigations, including customer and transactional information. There are thousands of data points to choose from; however, some are more important than others. For example, a person’s ‘first name’ is unlikely to be beneficial for training a model, but the ‘country of origin’ might be helpful. These data points are referred to as features.

We curate hundreds of features, including ‘credit card type,’ ‘number of transactions,’ ‘transaction velocity,’ the ‘age of the customer’s account,’ ‘email address,’ ‘IP address,’ ‘time between transactions’ and more to train models.

Features are frequently updated and evaluated based on their perceived importance to the classification and model performance.

Here is a sample set of the most important features for some organizations in the Unit21 system:

Over the lifetime of your account, your model’s performance is continually monitored. As features change importance and new features emerge, Unit21 re-trains your model using your latest data.

In summary, Unit21 pulls the relevant customer attributes and transaction data, applies it to the Alert Score model, and generates a score representing the likelihood that a SAR will be generated from the alert. 

The Alert score model can also be optimized to detect the different types of undesirable behavior viewed in previously reviewed alerts. 

How Accurate is Our Model?

An inaccurate model is futile. In the ML space, a common choice to measure the accuracy of models making a binary prediction is ROC-AUC.

A ROC curve, also known as Receiver Operating Characteristics Curve, is a metric used to measure the performance of a classifier model. The ROC curve depicts the rate of true positives (TP) concerning the rate of false positives (FP), therefore highlighting the sensitivity of the classifier model. 

The ROC curve is applied to the random forest’s predicted probability. The curve arises from sweeping all possible thresholds over the probability space and plotting the associated true positive rate (TPR) and false-positive rate (FPR) values for each threshold.  

An ideal classifier will have a ROC where the graph would hit a TPR of 100% with a 0% FPR.

Area Under Curve or AUC is one of the most popular metrics for model evaluation. AUC measures the entire two-dimensional area present underneath the entire ROC curve. AUC of a classifier equals the probability that the classifier will rank a randomly chosen positive example higher than that of a randomly chosen negative example. 

An excellent model has an AUC close to 1. This tells that it has a good measure of separability (i.e., how well the model can distinguish between classes). On the other hand, a poorly performing model will have an AUC close to 0.5, which indicates it is doing no better than randomly guessing.

Here is a sample set of ROC curves for some organizations in the Unit21 system:

A specific cutoff is chosen on the random forest’s probability to flag the highest risk alerts. 

Predicted possibilities greater than this cutoff are judged high risk, which is how we determine what Alert Scores to color code blue and red in the UI. High-risk alerts are colored red, while all other alerts are colored blue.

There is a trade-off regarding where to place the cutoff value. As the cutoff point decreases, we get more true positives (and our sensitivity or TPR increases) and more false positives (and our specificity or TNR suffers). 

Giving equal weight to sensitivity and specificity, Youden’s J statistic is a common way of choosing the optimal cutoff, which maximizes the number of correctly classified cases.

Youden’s cutoff occurs when the vertical distance between the ROC curve and the diagonal chance line is maximized. 

For example, in the figure above, the first org has an AUC of 0.9146, and the midnight blue vertical bar shows that Youden’s J chooses a cutoff where the true positive rate is already approaching 90%. 

Note in all figures that increasing the cutoff beyond Youden’s J has diminishing returns on improving the TPR while also significantly increasing the FPR.

Why Predictive Alert Scoring Matters: Final Thoughts

The goal of this feature is not to replace agents but to help surface alerts that are more likely fraudulent, increasing organizational efficiency. However, as Unit21 is a flag-and-review system, rules still need to be in place to generate alerts, and agents must investigate and resolve (disposition) the alerts. 

This is also required to train the models and maintain their accuracy. Not to mention that typical age-based alert ordering for investigation triage leads to delays in filing SARs and reduces overall effectiveness.

By producing an Alert Score for each alert, agents can:

  • Make decisions quickly
  • Reduce overall risk exposure by prioritizing alerts that identify criminal behavior faster
  • Reduce average investigation effort

Our model helps Fraud & Risk teams:

  • Triage alerts automatically
  • Look for signals not seen by rules alone
  • Increase investigation efficiency 
  • Reduces false positives
  • Create new workflows
  • Comply with confidence

Alert scoring is just the beginning of the machine learning elements of the Unit21 system. Digital fraud is moving faster than ever, and machine learning is the only way to move faster. If you would like a demo, get in touch!

Download Transaction Monitoring Product Guide

This post was written by Julien Pierret and JJ Lee from the Unit21 Machine Learning team.

Subscribe to our Blog!

Please fill out the form below:

Related Articles

Getting started is easy

See first-hand how Unit21
can help bolster your risk & compliance operations
GET a demo