Last Updated on 31. August 2022 by PantherMedia
Why every AI developer should be seriously worried about privacy in ai datasets
by Michael Osterrieder of vAIsual on AI datasets
Artificial Intelligence is highly transformative
Few people would disagree that Artificial Intelligence and AI datasets are on the cusp of transforming most industries as we know them. The photography business is one of the earliest to be impacted by AI. There are several AI tools which speed up workflow, enhance quality and expand the output of images.
Open source Code plays a key role
Many of these tools have been developed with the help of open source code (MIT licensed) released by OpenAI, a research and development company founded by Elon Musk. In January 2021 they released a neural network designed to convert text to images called DALL-E. It was a branch of this code which we used at vAIsual to begin to develop synthetic humans for stock media licensing.
openai’s Newest version GLIDE with remarkable change
More recently, in December 2021, OpenAI released the successor to DALL-E called GLIDE. It uses a different architecture, a quarter of the required parameters and is getting favorable reviews for improved quality. In this video review of GLIDE, Edan Meyer points out that “It doesn’t allow you to make human-like objects, they did some filtering on the dataset”. To me, this represents a remarkable alteration.
Privacy, copyright and ethical considerations
Although we can only speculate as to why the text to human image generation limitation in the dataset has been introduced between the DALL-E and GLIDE instances, the most likely reasoning would be due to privacy, copyright and ethical considerations. In particular, the legal personality rights (upheld by laws such as GDPR in Europe) pose an intrinsic risk when the human datasets used to train the AI are not legally clean.
The importance of the GDPR for datasets
This is important because although the models will never see themselves directly in the output, the heavy hand of GDPR compliance means that any person whose data has been used by a company has the right to require them to remove their data from their servers. We only need to look as far as the recent controversy of Facebook considering shutting down in Europe due to GDPR compliance issues, to see this is no small issue.
The risks of non-compliant datasets
When it comes to training an AI code, this means it would take one single person to file a complaint and the entire dataset will need to be re-edited and, potentially,expensively created AI models have to be retrained. This could run into tens of millions of dollars and will surely bankrupt many of the startups vying for a place in the market.
Extensive stock photography compliance activities
What many AI developers may not realize is that the IP stock industry is one of the most ardently monitored for copyright and privacy. Hundreds of millions of dollars a year are spent by companies to solve IP licensing issues with content they are using for marketing and advertising. For commercial use of images to be headache free (and therefore attractive to the marketplace) each human used to train the AI needs to have signed a biometric release that is GDPR compliant.
Big players like TikTok have already responded
This fact is not lost on the C-Suite of TikTok, who just changed their privacy policy to include that they “may collect biometric identifiers and biometric information” from their US based users’ content.
vAIsual has a clear policy
At vAIsual, we have seen this as a fundamental aspect to get right. The AI that we are training uses hundreds of thousands of images taken of models we have photographed in our own studios. Each model has signed a biometric model release that authorizes us to utilize these photographs for training our AI.
Impact of dataset security still underestimated
While we are seeing all sorts of AI generated images appearing in blogs, minted NFTs and otherwise shared online, the real impact of copyright, privacy and ethics is only starting to be understood when it comes to datasets and AI image generation.
vAIsual’s commitment to it’s customers
For now, and into the future, vAIsual is committed to staying on the right side of the law, and providing legally clean data sets for professional use by the IP stock image market.
Checkout aVIsual’s content on Panthermedia now:
See and license vAIsual’s Synthetic Humans Collection here!
THE FOLLOWING LINKS COULD ALSO BE OF INTEREST:
AI-generated images: Brand new at PantherMedia
USA travel is back – 360° photos
Future technology in styles and Designs
Cool looks straight from nature
You can find our FAQ here.