As Europe Legislates To Make AI Trustworthy, Synthetic Data Prepares To Play A Crucial Role

5 Jan 2023

Share article:

Tags:

By Steve Harris, CEO, Mindtech Global

AI is increasingly coming under the spotlight from Governments and individuals alike

There’s no getting away from it: there are tougher times in prospect for developers of visual AI systems. While training and testing networks and moving them into production has until now been a pretty smooth process, emerging legislation aimed at making AI more trustworthy is going to make life more challenging for data scientists and ML engineers

At issue here is the European Union’s upcoming Artificial Intelligence Act, which aims to regulate any type of AI technology in which data is used to make a decision, prediction or recommendation affecting any citizen in the EU’s 27 member states. The US, China and the UK are said to be not far behind in developing their own legal frameworks for trustworthy AI, too — possibly heavily influenced by the EU’s measures.

Broadly, the AI Act will apply legal obligations to the producers, distributors and deployers of AI-based technologies on an escalating scale depending on whether the application poses one of four threat levels: unacceptable risk, high risk, limited risk — or minimal/no risk.

Using AI for automatic sorting of job applications poses serious issues for fairness/bias

What’s deemed unacceptable to the EU, and which will therefore be prohibited, are AI-based systems that threaten human rights, safety and livelihoods — such as “social credit scoring” systems, where individuals have, variously, been blacklisted from jobs, travel, hotels and schools for a range of alleged societal slights.

What the bloc is regarding as high risk, however, includes many areas where visual AI has significant roles: in critical infrastructure, industrial safety, commercial security, law enforcement and in live remote biometric ID, for instance — so machine vision developers are really going to need to apply elevated risk assessments and mitigation measures in such fields if they are to comply.

As a result, there is much lobbying going on in Brussels at the moment about quite where the borderline will fall between ‘high risk’ and the more generally acceptable level of ‘limited risk’, as the commercial benefits of having something removed from high risk into the general acceptable class is going to be very high — so people are trying to constrain the definition of high risk.

While this is being thrashed out, however, what is very clear is that there is going to be a major requirement on companies to audit their machine learning network designs and ensure quality and lack of bias in the datasets they plan to use for training and testing — and from the very start of development, as soon as proof of concept is completed.

In many ways, complying with the new law will involve redoubling auditing efforts already undertaken by many players in this space. Already, we’ve seen machine vision projects dropping open source networks due to, for instance, strong biases in their crowdsourced tagging. And it’s also very likely users won’t have access to the data open source networks were trained on. So when, eventually, the AI Act is in force, taking systems into production could hit regulatory hurdles.

Accurate labeling of diverse populations is difficult and essential for bias free AI

Interestingly, ethics groups and legal departments in visual AI companies won’t necessarily have to perform all this compliance checking themselves: already, startups are beginning to spring up offering independent AI dataset/model auditing services — and that’s a great thing.

But what might result from such an audit is that it may be found the network’s training or test dataset coverage falls short of what’s needed to cover all eventualities and corner cases — so it just doesn’t work as intended. But all is not lost: patching shortfalls is a perfect task for synthetic data. Using a photorealistic 3D modelling platform like Mindtech’s Chameleon system, developers can generate an infinite variety of privacy-protected, bias-free, auto-annotated synthetic images.

Mindtech’s Chameleon can generate unlimited photorealistic actors across a broad range of shapes, sizes, (monk based) skin tones to help increase diversity and reduce bias, in a PII compliant manner

And as the EU has already made clear that the AI Act’s levels of risk can be hiked in future amendments at any time, a lot of companies will probably set a very high bar for the level of conformance — and so may err on the side of caution with significantly larger datasets to cover corner cases and ensure safety — again, that’s a perfect use case for synthetic images.

Still another way synthetic data will aid machine vision deployers in the new, regulated AI world will be in what the EU is calling the “post-market monitoring” phase — basically a requirement to continue to test high-risk systems even when deployed. And again, when such tests identify a corner case, a 3D photorealistic platform can synthesize as many images as are needed to plug it — in an instant.

These are just a few ways synthetic data can help make AI more trustworthy. Although much politicking between the tech sector and lawmakers remains, ahead of the AI Act coming into force, one thing is certain: AI will only benefit all of us if the public’s deep concerns about its biases and accuracy are addressed. At Mindtech we’re committed to ensuring synthetic data plays the strongest possible role in doing that.

As Europe Legislates To Make AI Trustworthy, Synthetic Data Prepares To Play A Crucial Role was originally published in MindtechGlobal on Medium, where people are continuing the conversation by highlighting and responding to this story.