If you follow these posts (there will be more than one) and attend my The Practical AI Bootcamp, you will be able to build, simulate, detect, and understand their scheme using only ChatGPT-like prompts. At the heart of the scheme detection is Graph and RAG (Retrieval Augmented Generation) working together, which we call Graph RAG.
There are a lot of database systems out there. While this blog series is both a demonstration and a proof of concept that can be completed with only ChatGPT-like prompts, it highlights the value of Oracle's graph query language, PGQL (Property Graph Query Language), which is part of Oracle 26ai. Since we're dealing with high concurrency monetary transactions, you need a relational database with robust concurrency control, backup and recovery, security—all the enterprise features. Oracle 26ai handles this perfectly.
This post focuses on how the fraudulent scheme works and how distributed financial and legal systems allow the fraud to occur.
In future posts, I'll show you how to generate the data, prepare it for analysis, discover the fraud, and visualize the scheme.
Most Oracle databases are islands, tracking transactions within one domain. But business reality spans multiple systems, and gaining insights often requires reasoning across disconnected systems that have no direct links. Think of it like multiple relational tables that support a larger system with absolutely no foreign keys. Not nice.
This is exactly where Graph RAG shines. It can discover relationships between systems that no one designed for.
You may be wondering why an organization would design a system like that. To protect you and me, there are legal requirements that force certain types of data to be physically separated. Many financial systems have authorization, transactional, and bank account number components that must be separated.
Break that rule, and you may have Slick Eddie and his buddies as cellmates.
This legal separation allows creative bad guys like Slick Eddie to make a living. Let me show you how.
Here is the scheme I made up, based on conversations with people who have to actually deal with stuff like this.
Slick Eddie buys a $100 gift card and activates it. Then, he creates 4 clones of the card and personally hands them to four of his buddies. The next morning, all five head to the same city, each hitting a different ATM. At exactly 9:47 AM, when the bank systems are super active, they each attempt to make a single withdrawal of $60 or $80.
That's it. Simple, elegant, and it can work.
Here's why the architecture matters.
Each of the distributed systems has extremely tight SLAs. If an SLA is about to be breached, the company is charged a penalty, and the transaction is routed to a competitor.
To avoid the SLA breach, as the SLA limit approaches, the system can choose to skip some of the validation steps. The company must balance the cost of a possible SLA breach with the cost of a non-fully validated financial transaction. It's a tightrope to walk.
For this particular scheme, because all the withdrawals occur in the same city, they flow through the same systems. Because the various systems are distributed and under SLA pressure, high transaction volume bugs can be exposed in the serialization controls: application design, database rows, and memory structures. This allows for the possibility of some of the withdrawals to successfully occur.
If only two of the four withdrawals succeed, the fraud is profitable, and Slick Eddie can feed his kids for another day.
Our objective is to detect that fraud occurred, how it happened, and the financial impact.
Our objective is not to prevent the fraud, but what we learn can help others build systems that reduce or prevent similar fraud in the future.
Before we can do any fancy mathematics, we need the raw data. This is not as simple as most of us would suspect. Again, it has to do with the fact that there are multiple distributed systems involved. So, any data we create needs to make sense for their individual systems, but also as a combined system as well.
LLMs are very good at generating data with specific constraints. At their nature, LLMs are kind of fuzzy, which is usually what real-life experience is like.
In my next post, we will generate the data for each of the three distributed systems.
So there you have it. How Slick Eddie and his gang make their living (or at least part of it), why we design systems that allow for fraud, and the tightrope every company faces: cost vs benefit.
In the next post, I'll detail how we create the distributed yet coordinated multi-system data we need to understand what Slick Eddie and his gang are up to. All of this using ChatGPT-like prompts.
All the best in your Oracle AI work,
Craig.