Fraud Detection 101 Q&A With Zach Pierce
This Q&A is part of the Fraud Fighters Manual, a collective set of stories from Fintech fraud fighters. Read Zach Pierce’s chapter on Fraud Detection 101 here, and download your copy of the Fraud Fighters Manual here to read the full version.
What is a rules engine? What are some of the advantages of using a rules engine for fraud detection?
A rules engine is an application that allows fraud detection agents to define rules related to a number of data points, including user activity, metadata, and self-reported user information. They help agents identify and respond to threats, and, when implemented correctly, can allow fraud teams to build out a labeled database that can be used to constantly improve the fraud detection system, and go from a reactive to a proactive approach.
How do you use machine learning for fraud detection? What are the advantages and disadvantages of using machine learning?
You build machine learning models with the help of training data. This is data that can help the algorithm understand the difference between legitimate and fraudulent transactions. As for advantages, machine learning models can handle large volumes of data and identify new threats. But a drawback is that they are more time intensive and complex to set up and maintain than simple rules. It all comes down to how you have trained the algorithm and how you continue to optimize it.
After deployment, ML models can react quickly to fraud patterns and offer a greater level of flexibility than rules since rules need to be explicitly defined.
When it comes to fraud detection, do you think using a rules engine is a good idea, or do you think that machine learning is a better alternative?
I think it is best to take a hybrid approach. Rules engines are nice in a few different scenarios, and when you’re first starting out, they’re typically all you’ve got. And they can be what you use to build the data set that lets you then move on to building a machine learning model. They can also be useful as a backstop because if you train machine learning models on a bunch of different features, you know, they’re going to look for what they’re going to look for, but there’s no guarantee that they will catch potential catastrophic loss. So, if you want to set some kind of rule that’s like, ‘any $10,000 payout gets looked at by a person,’ you could do that through a rules engine.
Do organizations necessarily have to use a rules engine first and then move on to machine learning?
I’ve heard there are vendors out there that have sort of out-of-the-box-type models that might help you if you’re dealing with credit card transactions. They share their machine learning models, or rather, the findings of their machine learning models, so it’s not necessarily true that you need rules to start.
But, using a rules engine can help you build a database of information if you set it up correctly and label your data with a good schema. You can then use that data to train your machine learning model. While using a rules engine to build your dataset is not a requirement, usually, a rules engine is the easiest way to start. Just set some very basic rules and then measure how well they’re doing, and then start a feedback loop of making it better and better.
What are data enrichment tools? What are the pros and cons of using data enrichment tools?
Working with data enrichment tools can be really helpful as they can give you information about a user that you wouldn’t have access to otherwise. For example, if a user provides their phone number, you are not necessarily going to know who the carrier is, but that is something that you can get by using data enrichment from a 3rd party provider.
I think you can get a lot of value out of them. For example, locating IP addresses or finding out what banks associate with what routing numbers, cell phone carriers associate with phone numbers, if someone is using a Voice Over IP, or a cell phone, or a landline; those are very useful.
But integrating with a new tool and building a process where information is shared between the two systems requires time, investment, and trust.
I haven’t personally worked directly with data enrichment tools, but I have had the experience of working within organizations where we had access to a lot of data as well as places where data was limited, and let’s just say it made my life a lot easier when we had lots of accurate data to work with.
What is consortium data? What are the pros and cons of using consortium data?
A consortium is a group of organizations that share data with each other for fraud prevention. I’ve honestly never really used this outside of the stuff that you kind of have to use when you’re dealing with credit cards, like Mastercard’s MATCH database. I see the value, in general, of having a list of merchants that have been terminated at your disposal to check against for making onboarding decisions. But I haven’t personally gone out and used any sort of consortium data. I feel like not everyone would want to share information about fraudulent activity in their organization, especially since your competitors are also part of the consortium.
How have different techniques of fraud detection and prevention evolved in the last few years? Have you seen any major progress in this field in general?
I don’t think I have been around for that long to comment with authority on this. But I think that the one change that I’ve seen is that technology is a lot more accessible now than it was seven years ago. When I was at Stripe, and we were building machine learning models, we had a team full of people with PhDs, and now you don’t necessarily need that anymore. Seven years ago, we had to build a fraud detection system from the ground up. Now there are out-of-the-box solutions out there that you can use. Detention and prevention tools have become much more accessible.
Which is the most commonly used technique to build a robust fraud detection system? Is it machine learning or rule-based fraud detection?
I can only really speak to the places I worked, and they definitely use both. I think rules engines are a great place to start and grow. When you see brand new activity, and you need to respond to it immediately, a rules engine is more useful because it’s much quicker to create a new rule than train and deploy a new ML model.
How do you see fraudsters most frequently defeating fraud detection?
I think the most sophisticated fraudsters are very professional. They’re running a business, and it’s their job to find who the most vulnerable company is and who they could defraud. So, they’re constantly finding new targets, testing new methods, and using new data that’s become available—like new stolen identities and new dumped credentials.
So, I think it’s that sort of relentlessness and continuing to test people that is how they ultimately get in. We’re kind of in a unique position at Lithic because our customers are Fintech companies, and we see fraudulent actors hit one company, and then move to the next, and then go somewhere else. So, I think it’s that persistence and the fact that they are professionals.
Sometimes it is amateurs in their bedroom trying to make fraudulent transactions, but a lot of the time, it’s professional criminals, and they also have an understanding of how these systems work, how machine learning is trying to catch on to them, and so on. They write guides and share them on the dark web. You can read them, and a lot of times, they get stuff right. Sometimes they get stuff wrong; they think that you’re doing things that you’re not, which is actually kind of useful. Like if they think you’re checking consumer credit scores or something, but you actually aren’t, it’s useful because it makes the identities they have to acquire more expensive.
How does an organization move from fraud detection to fraud prevention?
Fraud detection is reactive, and it’s typically where you start off because you’re learning about your product and how it can be abused. I think fraud prevention is when you shift to being more proactive, so you know what you’re looking for, and you can look for the early indicators of it.
Also, as I mentioned, fraudsters are running a business. One thing I like to do is figure out the most expensive part of their business model and then make it more expensive. It makes you much more unattractive to them. So, if the most expensive part is using stolen credit cards, for example, I’d think of ways to get them to burn more credit cards than usual before they realize that their transactions are unsuccessful.
Do you think companies should build their own transaction monitoring tool or get it from a vendor?
I think it really depends on what kind of business you are in because, from what I’ve seen of the transaction monitoring vendors, a lot of them have a particular customer archetype in mind. If you have a straightforward use case where you are issuing single cards or accounts to consumers or SMBs, a vendor solution can work great because that is the customer they tend to have in mind.
But, if you are a Fintech company with a very complex use case, the vendor solutions can start to break down. For instance, if you’re like Lithic, an infrastructure company that caters to Fintech companies, the assumptions that many vendors make around their data models and how their product works doesn’t always translate because Lithic's customers are more complex than the ones the vendors build their products to serve.
How do you adjust your strategy if false positive rates are high?
Typically, what you’re trying to do with transaction monitoring is you have your human bandwidth to review transactions, and then you’re trying to maximize the ROI of that human bandwidth. So, you’re trying to have it focus on the higher-performing rules that catch the most interesting things.
If false positives are high, it could be because your rule criteria is too broad, or perhaps your rules are working fine, but the people making the decisions are not trained properly or don’t have access to the right pieces of data that would enable them to make the correct decisions.
However, if your rules are too narrow, your system might not generate any alerts, which is also a problem.
It’s really just a matter of having the right visibility into how everything is performing so that you can notice where things are going wrong. You should be measuring all of these possible causes to the best of your ability so that when something like this comes up, it is easy to diagnose and make adjustments to counteract.
What are the essential components to risk assessment? On a practical level, what should companies do to assess their risk better?
I think the things that are the highest risk to me are any places where money moves. For example, if you are a neobank, your points of risk are anywhere money is coming in from bank accounts or other sources and then where money is going out in the form of payments or bank transfers. So, learn about how your business operates, how different types of transactions work, how loans or credit cards are approved, how applications are processed, and so on. That will help you understand how different areas can be abused by bad actors.
The less obvious things to think about though are the things that cause you to incur costs and if they can be abused. For example, many companies verify bank accounts using micro-deposits in cases where authentication through Plaid or someone similar isn’t available. The company will send a small amount of money, usually less than $2, and ask the user to confirm the amount. But there are fraudsters out there who will exploit this and go through hundreds or thousands of micro-deposit authentication attempts in order to steal money from you. There are similar vectors to think about around network fees from declines and things of that nature.
I think it’s helpful to have some knowledge about the kind of risks the business faces and vulnerabilities that could be abused. So, that combination of knowing your business, knowing where the money moves, how costs are incurred, and then the different sort of fraud archetypes helps you in risk assessment.
How do you see fraud detection and fraud prevention moving forward? Will AI play a role in this area?
I don’t really see too many immediate changes. In theory, you can use ChatGPT to generate a phishing email, but I don’t think it’s going to have any impact on how we detect fraud.
I do think, down the road, AI image generation will become pretty concerning, especially if you can generate very convincing images of people holding up their driver’s licenses. Because one of the most challenging forms of authentication to beat is someone holding up their driver’s license in a selfie, if AI can be used to fake that, especially to the point where it isn’t easily detectable, we may need tools that haven’t even been developed yet to counter that.
What fraud detection technologies make you the most excited?
This isn’t necessarily just restricted to fraud detection technologies, but there are a lot of low-code and no-code solutions that help non-engineers build internal tools and systems. These no-code and low-code tools have helped us move really fast in situations that’d otherwise need engineers. The way I see it, low-code tools will continue to empower fraud detection professionals in the future.
Do you see any specific challenges regarding fraud detection and prevention that organizations struggle with today?
I think the proliferation of neobanks and crypto exchanges was pretty big over the last few years. Acquiring a bank account used to be pretty high friction, but with neobanks, it became a very low-friction process, which meant it was much easier to create a bunch of bad accounts.
Likewise, I think crypto exchanges became a very attractive target for bad actors to exfiltrate funds. But I think that both of those are kind of waning a little bit. So, right now, there’s nothing brand new in terms of challenges that organizations struggle with today, but the things I mentioned are definitely very much in play.
For more, read Zach Pierce’s chapter on Fraud Detection 101, and download your copy of the Fraud Fighters Manual here to read the full version.