Real world data sources, “NC” license use for commercial entities

1 Mar 2023

Share article:

Tags:

Are these images suitable for training AI systems? Commercially? Research? (Deliberately blurred here)

I’m really struggling here, so if there are any #legal experts around, especially with an understanding of #AI I’d love their opinion.

I posted on LinkedIn a few weeks ago about the issues finding real world #datasets, suitable for commercial use, especially for #machinelearning training.

As I came across this Flickr-Faces #FFHQ which has been scraped from Flickr by NVIDIA AI https://github.com/NVlabs/ffhq-dataset

They quite clearly state they have included https://creativecommons.org/licenses/by-nc/2.0/ images.

There are a couple of questions here:

1) My assumption has always been that if you are part of a commercial (especially for-profit) organization, then anything marked NC is off limits, even if only used for research within that company, and not commercialized directly, as ultimately any work you are doing is towards the profitability of the company. Is there a difference between research and actual product deployment, as far as usage rights is concerned?

2) Can we make assumptions that just because an image appears on a website with a permissive license such as those at Pixabay, Unsplash etc. then we can use those for ML training? And further, have identifiable people in those types of sites really signed model releases? And model releases that include use for AI purposes

3) I assume that for the images of minors/children, the parents should theoretically have given their permissions — is this binding?

Interested in opinions, as possibly I am being overly sensitive, and unnecessarily impeding the development of our research teams by not allowing usage of NC images for research, and exhibiting caution even with those that are allowing commercial use.

Do we have any specific model releases for AI work? e.g. that permit the use of images to help train networks, but should never be published as test cases?

Of course these legal issues are the exact reasons for us wanting to further the use of Synthetic data — https://mindtech.global

Real world data sources, “NC” license use for commercial entities was originally published in MindtechGlobal on Medium, where people are continuing the conversation by highlighting and responding to this story.