Reply from IBM about my online photos scraped without consent

Diversity in Faces(DiF)

Following my post about facial recognitions dirty little secret millions of online photos scraped without consent. I got a reply from Flickr and IBM’s Diversity in Faces project.
First Flickr’s automated email…

Hi ian,

Thanks for reaching out to us!

We’ve received your message and will be responding as quickly as possible. In the meantime, do visit the Flickr Help Forum and our Help Center as the answer to your question may be found there.

We look forward to connecting and will be in touch soon.

Cheerfully,
The Flickr Team

Already Pro? Then expect a response shortly, because you are already in our VIP queue! (Make sure to write to us using the email address on your Pro account.)

Dear Ian,
Thank you for your email.
The Diversity in Faces (DiF) project, referenced in your request below, is a non-commercial, research initiative. The DiF dataset includes a list of URLs (but not the images themselves), linking to publicly available images on Flickr under certain creative commons licenses, along with associated annotations. We have taken great care to ensure that the DiF dataset does not include Flickr IDs or any other Flickr identifiers of individuals.
In order to respond to your request, we will need to locate the URLs in the DiF dataset that are linked to your Flickr ID (if any). To do this, we will need your Flickr ID, along with your express consent to use it for the sole purpose of locating such URLs and responding to your request.  Separately, if you would like us to, we can remove any URLs of images linked to your Flickr ID from the DiF dataset.  Please confirm this by reply.
After conducting our search, we will delete your Flickr ID from our records, and if you so request, we will also remove any URLs and associated annotations from the DiF dataset connected to your Flickr ID. We will confirm when this process has been completed.
With respect to your request to access your personal data processed by IBM outside the DiF project, you will be contacted separately by the IBM Data Subject Rights Operations Team (Email at ibm.com) to proceed with your request.
Let us know if you have any questions or how we can further assist you with your request.
IBM Research DiF Team

Facial recognition’s ‘dirty little secret’: Millions of online photos scraped without consent

By Olivia Solon

Facial recognition can log you into your iPhone, track criminals through crowds and identify loyal customers in stores.

The technology — which is imperfect but improving rapidly — is based on algorithms that learn how to recognize human faces and the hundreds of ways in which each one is unique.

To do this well, the algorithms must be fed hundreds of thousands of images of a diverse array of faces. Increasingly, those photos are coming from the internet, where they’re swept up by the millions without the knowledge of the people who posted them, categorized by age, gender, skin tone and dozens of other metrics, and shared with researchers at universities and companies.

When I first heard about this story I was annoyed but didn’t think too much about it. Then later down the story, its clear they used creative commons Flickr photos.

“This is the dirty little secret of AI training sets. Researchers often just grab whatever images are available in the wild,” said NYU School of Law professor Jason Schultz.

The latest company to enter this territory was IBM, which in January released a collection of nearly a million photos that were taken from the photo hosting site Flickr and coded to describe the subjects’ appearance. IBM promoted the collection to researchers as a progressive step toward reducing bias in facial recognition.

But some of the photographers whose images were included in IBM’s dataset were surprised and disconcerted when NBC News told them that their photographs had been annotated with details including facial geometry and skin tone and may be used to develop facial recognition algorithms. (NBC News obtained IBM’s dataset from a source after the company declined to share it, saying it could be used only by academic or corporate research groups.)

And then there is a checker to see if your photos were used in the teaching of machines. After typing my username, I found out I have 207 photo(s) in the IBM dataset. This is one of them:

Not my choice of photo, just the one which comes up when using the website

Georg Holzer, uploaded his photos to Flickr to remember great moments with his family and friends, and he used Creative Commons licenses to allow nonprofits and artists to use his photos for free. He did not expect more than 700 of his images to be swept up to study facial recognition technology.

“I know about the harm such a technology can cause,” he said over Skype, after NBC News told him his photos were in IBM’s dataset. “Of course, you can never forget about the good uses of image recognition such as finding family pictures faster, but it can also be used to restrict fundamental rights and privacy. I can never approve or accept the widespread use of such a technology.”

I have a similar view to Georg, I publish almost all my flickr photos under a creative commons non-commercial sharealike licence. I swear this has been broken. I’m also not sure if the pictures are all private or not. But I’m going to find out thanks to GDPR

There may, however, be legal recourse in some jurisdictions thanks to the rise of privacy laws acknowledging the unique value of photos of people’s faces. Under Europe’s General Data Protection Regulation, photos are considered “sensitive personal information” if they are used to confirm an individual’s identity. Residents of Europe who don’t want their data included can ask IBM to delete it. If IBM doesn’t comply, they can complain to their country’s data protection authority, which, if the particular photos fall under the definition of “sensitive personal information,” can levy fines against companies that violate the law.

Expect a GDPR request soon IBM! Anything I can do to send a message I wasn’t happy with this.

Is the BBC Backstage podcast the first CC licenced piece from the BBC?

Michela Ledwidge asks the question, and we racked our brains and did a lot of searching. I think it might be, but I can't say for sure. If thats not a first, using blip.tv is certainly a first. And to be honest, if it wasn't for the ability to…

  1. Set the license (creative commons attribution 2.5 in this case)
  2. Pipe content to Archive.org for permanent storage and to the benefit of generations to come

We would have never have consider it. Maybe we've been drinking too much of Lessig's kool aid. Although I was a little worried about the Blip.tv EULA. But Mike at Blip says,

As far as the EULA, we don't own all the rights. Don't want them. We need to find a way to make that even clearer. When you upload you give us the rights to create derivative works (for thumbnails and transcoding) and to distribute (i.e. make available for download). Those rights go away when you delete the content from blip.

Another reason why the archive.org angle is very important. If Blip.tv ever pulled a Yahoo/Flickr thing on its users. You could pipe them all to Archive.org and remove them from Blip. Metadata and all..

Comments [Comments]
Trackbacks [0]

Code v.2.0 launched today

Code 2.0 book

Lawrence Lessig just launched Code version 2.0 today and best of all he released it under a Creative Commons Attribution-ShareAlike 2.5 License.

So Code v2 is officially launched today. Some may remember Code and Other Laws of Cyberspace, published in 1999. Code v2 is a revision to that book — not so much a new book, as a translation of (in Internet time) a very old book. Part of the update was done on a Wiki. The Wiki was governed by a Creative Commons Attribution-ShareAlike license. So too is Code v2.

Thus, at http://codev2.cc, you can download the book. Soon, you can update it further (we're still moving it into a new wiki). You can also learn a bit more about the history of the book, and aim of the revision. And finally, there are links to buy the book — more cheaply than you likely can print it yourself.

Lessig is already asking for remixes, which is great because I'm certainly going to convert it so it works on my phone soon.

Comments [Comments]
Trackbacks [0]