Most people don’t think much about data trust, and when they do, their focus often centers on concerns about deepfakes. But it’s not just deepfake pictures and videos and consumer use cases that people need to look at. It’s critical to use these very visible examples to think about the invisible threat that is growing in the digital economy: fake data of all kinds threaten to disrupt our businesses, which today run entirely on data. And with generative AI now put to pretty much every use under the sun, the calculus has turned decisively in favor of the attacker: they can generate and fake data much faster than the good guys can process and verify it. Here we’ll delve into how to avoid paying the price of untrustworthy data.
If you want evidence, I’ll give you 122 million reasons why. A man stole $122 million from Google and Facebook simply by sending them fake invoices, which the tech giants paid.
Eventually, authorities caught the man and sent him to prison, but the point is that he did it, and he could have gotten away with it. This may sound like a one-time occurrence, but it is not. The volume of data we consume these days coupled with the many different ways that we receive it makes it very hard to check everything.
Fake invoices and untrustworthy data are a large and growing problem
Once again, this very visible problem of invoice fraud is just the tip of a very large iceberg. Payment Diversion Fraud (amusingly referred to as PDF) hit UK businesses hundreds of times a month to the tune of hundreds of millions in losses. And manually processing invoices is not the only way to scam a business or cause harm: any kind of false documentation can cause a bad decision. Things like this are becoming increasingly easy to do in our data-driven, multiparty, highly connected world in which businesses and just about everything else are operating at lightning speed.
With a few clicks in GenAI, anyone can also create a really convincing picture of an invoice from a fake company with a fake website.
If hackers can do all of that with just the click of a button, what is a business expected to do to defend itself?
People and manual approaches can’t address the scale of this challenge
The BBC has responded by establishing BBC Verify, an operation consisting of 60 journalists who try to verify video, counter disinformation and analyze data in pursuit of the truth.
Of course, most businesses can’t afford to dedicate 60 employees to look after data integrity. Even if they could, trying to address data trust by adding more and more people and processes doesn’t scale. If you try to tackle data integrity that way, it will slow your business down.
In other areas of industry we used to have slow, manual processes for assessing new software updates or operating data in our businesses, but these no longer keep pace with the number of updates we face.
The tools and processes people have used in the past were fine; they’re just no longer a fit for our current context. Businesses need to move with the times, just as they did in the early days of computers by implementing passwords and then later adding two-factor authentication.
Organizations today require a better trust model
What’s needed now is a better model of trust – a model that’s powered by provenance and authenticity. The Internet Engineering Task Force’s Supply Chain Integrity, Transparency and Trust working group, which I co-chair, is working on a standard to enable this model.
The standard lays out three steps to define and manage provenance.
First, you need strong identification of the data source, and you must know that the digital asset that you received is the same one that was sent. This helps to avoid data corruption, modification and tampering and works to prevent fraud in the supply chain.
Second, you need to ensure that supply chain partners can’t rescind what they said in the past about data. Provenance information in this new model of trust must be immutable so everybody in the data supply chain can keep and check provenance on data forever. This becomes extremely important when something goes wrong, and a business wants to check what happened and why to remediate the problem. Businesses in such situations today often run into brick walls because when problems arise things tend to shut down, and they can’t get the answers they need from supply chain partners.
Third, you need to avoid split histories. A big part of building trust is knowing that others are consistent. Mistrust occurs when others tell one party one thing and another something else. With an appropriate level of transparency for the supply chain, you can avoid split histories.
Addressing data trust by embracing this approach, rather than continuing to trust potentially untrustworthy data or attempting to solve this massive problem manually, enables businesses to move to a data-centric verify-then-trust model based on transparency. That gives businesses the upper hand against hackers and supply chain risks, empowering organizations to recognize and reject fake invoices and otherwise avoid paying the price of using untrustworthy data.