IBM DIF project removes my flickr urls


Hopefully the final follow up from my post about facial recognitions dirty little secret millions of online photos scraped without consent. and the update.

Thank you for your prompt response. We confirm that we have deleted from the DiF dataset all the URLs linked to your Flickr ID and associated annotations. We have also deleted your Flickr ID from our records. IBM will require our research partners to comply with your deletion request and provide IBM with confirmation of compliance.

Best regards,

IBM Research DiF team

End of the matter, although part of me wants to contact everybody in the photos and tell them what happened. Not sure what that would achieve however?

Reply from IBM about my online photos scraped without consent

Diversity in Faces(DiF)

Following my post about facial recognitions dirty little secret millions of online photos scraped without consent. I got a reply from Flickr and IBM’s Diversity in Faces project.
First Flickr’s automated email…

Hi ian,

Thanks for reaching out to us!

We’ve received your message and will be responding as quickly as possible. In the meantime, do visit the Flickr Help Forum and our Help Center as the answer to your question may be found there.

We look forward to connecting and will be in touch soon.

Cheerfully,
The Flickr Team

Already Pro? Then expect a response shortly, because you are already in our VIP queue! (Make sure to write to us using the email address on your Pro account.)

Dear Ian,
Thank you for your email.
The Diversity in Faces (DiF) project, referenced in your request below, is a non-commercial, research initiative. The DiF dataset includes a list of URLs (but not the images themselves), linking to publicly available images on Flickr under certain creative commons licenses, along with associated annotations. We have taken great care to ensure that the DiF dataset does not include Flickr IDs or any other Flickr identifiers of individuals.
In order to respond to your request, we will need to locate the URLs in the DiF dataset that are linked to your Flickr ID (if any). To do this, we will need your Flickr ID, along with your express consent to use it for the sole purpose of locating such URLs and responding to your request.  Separately, if you would like us to, we can remove any URLs of images linked to your Flickr ID from the DiF dataset.  Please confirm this by reply.
After conducting our search, we will delete your Flickr ID from our records, and if you so request, we will also remove any URLs and associated annotations from the DiF dataset connected to your Flickr ID. We will confirm when this process has been completed.
With respect to your request to access your personal data processed by IBM outside the DiF project, you will be contacted separately by the IBM Data Subject Rights Operations Team (Email at ibm.com) to proceed with your request.
Let us know if you have any questions or how we can further assist you with your request.
IBM Research DiF Team

Facial recognition’s ‘dirty little secret’: Millions of online photos scraped without consent

By Olivia Solon

Facial recognition can log you into your iPhone, track criminals through crowds and identify loyal customers in stores.

The technology — which is imperfect but improving rapidly — is based on algorithms that learn how to recognize human faces and the hundreds of ways in which each one is unique.

To do this well, the algorithms must be fed hundreds of thousands of images of a diverse array of faces. Increasingly, those photos are coming from the internet, where they’re swept up by the millions without the knowledge of the people who posted them, categorized by age, gender, skin tone and dozens of other metrics, and shared with researchers at universities and companies.

When I first heard about this story I was annoyed but didn’t think too much about it. Then later down the story, its clear they used creative commons Flickr photos.

“This is the dirty little secret of AI training sets. Researchers often just grab whatever images are available in the wild,” said NYU School of Law professor Jason Schultz.

The latest company to enter this territory was IBM, which in January released a collection of nearly a million photos that were taken from the photo hosting site Flickr and coded to describe the subjects’ appearance. IBM promoted the collection to researchers as a progressive step toward reducing bias in facial recognition.

But some of the photographers whose images were included in IBM’s dataset were surprised and disconcerted when NBC News told them that their photographs had been annotated with details including facial geometry and skin tone and may be used to develop facial recognition algorithms. (NBC News obtained IBM’s dataset from a source after the company declined to share it, saying it could be used only by academic or corporate research groups.)

And then there is a checker to see if your photos were used in the teaching of machines. After typing my username, I found out I have 207 photo(s) in the IBM dataset. This is one of them:

Not my choice of photo, just the one which comes up when using the website

Georg Holzer, uploaded his photos to Flickr to remember great moments with his family and friends, and he used Creative Commons licenses to allow nonprofits and artists to use his photos for free. He did not expect more than 700 of his images to be swept up to study facial recognition technology.

“I know about the harm such a technology can cause,” he said over Skype, after NBC News told him his photos were in IBM’s dataset. “Of course, you can never forget about the good uses of image recognition such as finding family pictures faster, but it can also be used to restrict fundamental rights and privacy. I can never approve or accept the widespread use of such a technology.”

I have a similar view to Georg, I publish almost all my flickr photos under a creative commons non-commercial sharealike licence. I swear this has been broken. I’m also not sure if the pictures are all private or not. But I’m going to find out thanks to GDPR

There may, however, be legal recourse in some jurisdictions thanks to the rise of privacy laws acknowledging the unique value of photos of people’s faces. Under Europe’s General Data Protection Regulation, photos are considered “sensitive personal information” if they are used to confirm an individual’s identity. Residents of Europe who don’t want their data included can ask IBM to delete it. If IBM doesn’t comply, they can complain to their country’s data protection authority, which, if the particular photos fall under the definition of “sensitive personal information,” can levy fines against companies that violate the law.

Expect a GDPR request soon IBM! Anything I can do to send a message I wasn’t happy with this.

Follow up from MyHeritage GDPR request

Shadow profile
I got this from MyHeritage today… after submitting my GDPR request to them to find out the history of my account.
We apologize for this breach and the fact that your email address might have been part of it. The email addresses were included in the breach along with a hashed password – not the actual password (which has been expired and can no longer be used to access the account on MyHeritage). Other than this, there has not been a violation of the data. See our official statement here and an updated statement here.

Please be advised that this incident does not affect the privacy of any sensitive information you have on your online family site, including DNA information and family trees. Only hashed versions of passwords were stolen, which means they cannot be used to log in to your private account on MyHeritage.

There has been no evidence that the stolen information was ever used by the perpetrators. Since Oct 26, 2017 (the date of the breach) and the present we have not seen any activity indicating that any MyHeritage accounts had been compromised.

The privacy and the security of your information is our highest priority and we continually assess our procedures and policies and seek the best methods to secure information. The work on adding two-factor authentication to MyHeritage is completed and you can read the full explanation about this feature here.

In addition to that, I have carried out a search within our system, and I was not able to locate an account using your email address: **********************************

If you had an account using this email address and the account was deleted, we currently do not retain any information from your registered account and therefore, I cannot provide you with any information regarding it as it no longer exists.

However, if you registered to MyHeritage using another email address, please let me know with which so I will be able to locate it. In addition to that, as an extra security measure, if you still have access to this email address would you please be so kind to send us an email using that address?

If you run into any further issues, by all means, please don’t hesitate to reply. I’m here for you.

MyHeritage Support team

Maybe I deleted my account too soon, unfortunately giving them a easy out. I should have done the GDPR request then deleted my account afterwards! I was looking forward seeing proof the account was a shadow profile

GDPR dating information update

Hackers movie

With GDPR I send out emails to OKCupid, Plenty of Fish, Tinder and others. So far I’ve only gotten responses from POF and OkCupid. Which means Tinder and others have about a day or so to get back to me with everything before I can start to throw down some fire.

Before I headed on holiday, I got a message from POF then OKcupid a day later, saying they need the request from the email which is on the account. Fair enough, so I forwarded each email to that email address and replied all to myself and to them but from that email account address.

A few days later I got emails, first from POF and then OKCupid.

You have recently requested a copy of your PlentyofFish (“POF”) personal data, and we’re happy to report that we have now verified your identity.

We are attaching a copy of your personal data contained in or associated with your POF account.  The password to access the personal data will be sent in a separate email.

By downloading this data, you consent to the extraction of your data from POF, and assume all risk and liability for such downloaded data. We encourage you to keep it secure and take precautions when storing or sharing it.

The information contained in this archive may vary depending on the way you have used POF. In general, this information includes content and photos you have provided us, whether directly or through your social media accounts, messages you have sent and other data you would expect to see from a social media service like POF.

Please note that there is some information we cannot release to you including information that would likely reveal personal information about other users. Those notably include messages you received on POF, which are not provided out of concern for the privacy of the senders.

Sincerely,

POF Privacy Team

Then similar from OKcupid, which makes sense being the same company really.

Dear Mr. Forrester:

You have recently requested a copy of your OkCupid personal data, and we’re happy to report that we have now verified your identity.

We are attaching a copy of your personal data contained in or associated with your OkCupid account. The password to access the personal data will be sent in a separate email.

By downloading this data, you consent to the extraction of your data from OkCupid, and assume all risk and liability for such downloaded data. We encourage you to keep it secure and take precautions when storing or sharing it.

The information contained in this archive may vary depending on the way you have used OkCupid. In general, this information includes content and photos you have provided us, whether directly or through your social media accounts, messages you have sent and other data you would expect to see from a social media service like OkCupid.

Please note that there is some information we cannot release to you including information that would likely reveal personal information about other users. Those notably include messages you received on OkCupid, which are not provided out of concern for the privacy of the senders.

Sincerely,

OkCupid Privacy Team

So on my train journey from Stockholm to Copenhagen, I had a look inside the Zip files shared with me. Quite different, I’d be interesting to see what others will do.

  • Forrester, I – POF Records.zip
    • UserData.json | 6.2 kb
    • UserData.pdf | 40.5 kb
    • Profile_7.jpg | 30.1 kb
    • Profile_6.jpg | 25.0 kb
    • Profile_5.jpg | 17.4 kb
    • Profile_4.jpg | 18.8 kb
    • Profile_3.jpg | 26.6 kb
    • Profile_2.jpg | 11.7 kb
    • Profile_1.jpg | 30.7 kb
  • OkCupid_Records_-Forrester__I.zip
    • Ian Forrester_JSN.txt | 3.8 mb
    • Ian Forrester_html.html | 6.6mb

As you can see quite different, interestingly no photos in the OKCupid data dump, even the ones I shared as part of my profile. In POF the PDF is a copy of the Json file, which is silly really.

So the Json files are the most interesting parts…

Plenty of Fish

.POF don’t have much interesting data, basically a copy of my profile data in Json including Firstvisit, FirstvisitA, etc to FirstvisitE complete with my ip address. I also can confirm I started my profile on 2012-01-25.

Then there is my BasicSearchData and AdvancedSearchData  which includes the usual stuff and when I LastSearch ‘ed and from which IP address.

Nothing else… no messages

OkCupid

OkCupid has a ton more useful information in its Json. Some interesting parts; I have logged into OKCupid a total of 24157 times! My status is Active? My job is Technology?  The geolocation_history is pretty spot on and the login_history goes from July 2007 to current year, complete with IP and time.

The messages is really interesting! They decided to share one of the messages, so only the ones you send rather what you received. As the messages are not like emails, you don’t get the quoted reply, just the sent message. Each item includes who from (me) and time/date. There are some which are obviously a instant massager conversation which look odd reading them now. In those ones, theres also fields for peer, peer_joined, time and type. Its also clear where changes have happened for example when you use to be able to add some formatting to the message and you use to have subject lines.

Some which stick out include, Allergic to smoking?, insomnia, ENTP and where next, The Future somewhat answered, So lazy you’ve only done 40 something questions, Dyslexia is an advantage, But would you lie in return? No bad jokes, gotland and further a field, Ok obvious question, etc.

Next comes the photos (My photos, no one elses)

"caption": "OkCupid's removal of visitors is so transparent, I don't know why they bothered to lie to us all?", 
"photo": "https://k1.okccdn.com/php/load_okc_image.php/images/6623162030294614734", 
"status": "Album Picture Active", 
"uploaded": "2017-08-08 19:16:20"

Of course the images are publicly available via the url, so I could pull them all down with a quick wget/curl. Not sure what to make about this idea of making them public. Security through obscurity anyone?

Stop screwing with OKCupid
As long as you can see the picture above, OKCupid is making my profile pictures public

Now the images strings seems to be random but don’t think this is a good idea at all! Wondering how it sits with GDPR too, also wondering if they will remove them after a period of time. Hence if the image a above is broken, then you know what happened.

Then we are on to the purchases section. It details when I once tried A-list subscription and when I cancelled it. How I paid (paypal), how much, address, date, etc… Its funny reading about when I cancelled it…

"comments": "userid = 7367007913453081320 was downgraded to amateur", 
"transaction": "lost subscription",

The big question I always had was the question data. Don’t worry they are all there! For example here’s just one of mine.

{
"answer_choices": {
"1": "Yes", 
"2": "No"
}, 
"prompt": "Are you racist?", 
"question_id": 7847, 
"user_acceptable_answers": [
"No"
], 
"user_answer": "No", 
"user_answered_publicly": "no", 
"user_importance": "mandatory"
},

After all those questions, theres a bunch of stuff about user_devices I’ve used to log into OkCupid over the years going right back. Stuff about preferences for searches, etc.

Going to need some time to digest everything but the OKCupid data dump is full of interesting things. I might convert the lot to XML just to make it easier for me to over view.

OKcupid responds to my GDPR request

OkCupid no Match protest

I mentioned how I emailed a load of dating sites for my data and then some… Under GDPR. So far I’ve got something form POF but OKcupid finally got back to me, after finally making it to supportconsole@okcupid.com.

Hello,

OkCupid has received your recent request for a copy of the personal data we hold about you.

For your protection and the protection of all of our users, we cannot release any personal data without first obtaining proof of identity.

In order for us to verify your identity, we kindly ask you to:

1. Respond to this email from the email address associated with your OkCupid account and provide us the username of your OkCupid account.

2. In your response to this email, please include a copy of a government-issued ID document such as your passport or driving license. Also, we ask you to please cover up any personal information other than your name, photo and date of birth from the document as that is the only information we need.

We may require further verification of your identity, for example, if the materials you provide us do not establish your identity as being linked to the account in question.

Please note that if you previously closed your account, your data may be unavailable for extraction as we proceed to its deletion or anonymization in accordance with our privacy policy. Even if data is still available for extraction, there is some information we cannot release to you including information that would likely reveal personal information about other users. Those notably include messages you received on OkCupid, which are not provided out of concern for the privacy of the senders.

Best,

OkCupid Privacy Team

Pretty much the same as the POF reply.

POF first to respond to my GDPR request

Plenty of Fish

I mentioned how I emailed a load of dating sites for my data and then some… Under GDPR. So far I’ve been bounced around a little but POF is the first positive email I gotten so far…

PlentyofFish (“POF”) has received your recent request for a copy of the personal data we hold about you.

For your protection and the protection of all of our users, we cannot release any personal data without first obtaining proof of identity.

In order for us to verify your identity, we kindly ask you to:

1. Respond to this email from the email address associated with your POF account and provide us the username of your POF account.

2. In your response to this email, please include a copy of a government-issued ID document such as your passport or driving license. Also, we ask you to please cover up any personal information other than your name, photo and date of birth from the document as that is the only information we need.

We may require further verification of your identity, for example, if the materials you provide us do not establish your identity as being linked to the account in question.

Please note that if you previously closed your account, your data may be unavailable for extraction as we proceed to its deletion or anonymization in accordance with our privacy policy. Even if data is still available for extraction, there is some information we cannot release to you including information that would likely reveal personal information about other users. Those notably include messages you received on POF, which are not provided out of concern for the privacy of the senders.

Best,

POF Privacy Team

Well I guess they are being careful at least but will be interested to see what other questions they ask me.

Still wondering when the rest will get in touch?

Data portability and GDPR, been waiting a long time for this

EU GDPR 2018

One of the things I always wanted but never couldn’t see how it would happen without the good will of companies. Was real data portability of my own data.

Google, Facebook and others do provide a data dump but I found it really interesting to see the difference in my Facebook dump/zip/archive. I request it every year or when something changes. This year I did one while Facebook struggled to deal with the impact of Cambridge Analytica and the new GDPR changes.

In 2017 my zip was 31.4 MB (31,425,658 bytes)
In 2018 my zip was 171.3 MB (171,267,617 bytes)

Unlike previously FB included ALL the media in the messages I’ve exchanged with friends. All those gifs and videos friends have shared are now in the dump. I find it interesting they were not included previously. Which always raises the question of ownership. Something we (dataportability group) talked a lot.

I’m so looking forward to similar with other services… Although I’m still unsure if you can legally create services which use the data exports to import or not. It should be possible, as its your data.

Having already crafted a email to send to OKCupid, POF, Bumble, Tinder and some other dating sites similar to when the journalist requested every bit of data they had on her. Its set to send on May 25th which is the day when GDPR comes into effect aka tomorrow!

Thanks to Ubergill for much improving the email I originally drafted…

I’m looking forward to the replies!

Dear {service}

I am making this request for access to personal data pursuant to Article 15 of the General Data Protection Regulation. I am still concerned that your company’s information practices may be putting my personal information at undue risk of exposure or in fact has breached its obligation to safeguard my personal information.

I would like you to be aware at the outset, that I expect a reply to my request within one month as required under Article 12, failing which I will be forwarding my inquiry with a letter of complaint to the Information Commissioner’s Office.

Please advise as to the following:

  1. Please confirm to me whether or not my personal data is being processed. If it is, please provide me with the categories of personal data you have about me in your files and databases.
  2. In particular, please tell me what you know about me in your information systems, whether or not contained in databases, and including e-mail, documents on your networks, or voice or other media that you may store.
  3. Additionally, please advise me in which countries my personal data is stored, or accessible from. In case you make use of cloud services to store or process my data, please include the countries in which the servers are located where my data are or were (in the past 12 months) stored.
  4. Please provide me with a copy of, or access to, my personal data that you have or are processing.
  5. Please provide me with a detailed account of the specific uses that you have made, are making, or will be making of my personal data.
  6. Please provide a list of all third parties with whom you have (or may have) shared my personal data.
  7. If you cannot identify with certainty the specific third parties to whom you have disclosed my personal data, please provide a list of third parties to whom you may have disclosed my personal data.
  8. Please also identify which jurisdictions that you have identified in 1(b) above that these third parties with whom you have or may have shared my personal data, from which these third parties have stored or can access my personal data. Please also provide insight in the legal grounds for transferring my personal data to these jurisdictions. Where you have done so, or are doing so, on the basis of appropriate safeguards, please provide a copy.
  9. Additionally, I would like to know what safeguards have been put in place in relation to these third parties that you have identified in relation to the transfer of my personal data.
  10.  Please advise how long you store my personal data, and if retention is based upon the category of personal data, please identify how long each category is retained.
  11. If you are additionally collecting personal data about me from any source other than me, please provide me with all information about their source, as referred to in Article 14of the GDPR.
  12. If you are making automated decisions about me, including profiling, whether or not on the basis of Article 22 of the GDPR, please provide me with information concerning the basis for the logic in making such automated decisions, and the significance and consequences of such processing.
  13.  I would like to know whether or not my personal data has been disclosed inadvertently by your company in the past, or as a result of a security or privacy breach.
  14. If so, please advise as to the following details of each and any such breach:
  15. a general description of what occurred;
  16. the date and time of the breach (or the best possible estimate);

iii. the date and time the breach was discovered;

  1. the source of the breach (either your own organisation, or a third party to whom you have transferred my personal data);
  2. details of my personal data that was disclosed;
  3. your company’s assessment of the risk of harm to myself, as a result of the breach;

vii. a description of the measures taken or that will be taken to prevent further unauthorised access to my personal data;

viii. contact information so that I can obtain more information and assistance in relation to such a breach, and

  1. information and advice on what I can do to protect myself against any harms, including identity theft and fraud.
  2. If you are not able to state with any certainty whether such an exposure has taken place, through the use of appropriate technologies, please advise what mitigating steps you have taken, such as
  3. Encryption of my personal data;
  4. Data minimisation strategies; or,

iii. Anonymisation or pseudonymisation;

  1. Any other means
  2. I would like to know your information policies and standards that you follow in relation to the safeguarding of my personal data, such as whether you adhere to ISO27001for information security, and more particularly, your practices in relation to the following:
  3. Please inform me whether you have backed up my personal data to tape, disk or other media, and where it is stored and how it is secured, including what steps you have taken to protect my personal data from loss or theft, and whether this includes encryption.
  4. Please also advise whether you have in place any technology which allows you with reasonable certainty to know whether or not my personal data has been disclosed, including but not limited to the following:
  5. Intrusion detection systems;
  6. Firewall technologies;

iii. Access and identity management technologies;

  1. Database audit and/or security tools; or,
  2. Behavioural analysis tools, log analysis tools, or audit tools;
  3.  In regards to employees and contractors, please advise as to the following:
  4. What technologies or business procedures do you have to ensure that individuals within your organisation will be monitored to ensure that they do not deliberately or inadvertently disclose personal data outside your company, through e-mail, web-mail or instant messaging, or otherwise.
  5. Have you had had any circumstances in which employees or contractors have been dismissed, and/or been charged under criminal laws for accessing my personal data inappropriately, or if you are unable to determine this, of any customers, in the past twelve months.
  6. Please advise as to what training and awareness measures you have taken in order to ensure that employees and contractors are accessing and processing my personal data in conformity with the General Data Protection Regulation.

Thank you,

Ian

Data portability in online dating sooner than they think?

Dating Apps make money from attention & personal data

I have written a few times about disruption in online dating, heck its something which will be discussed at Mozilla Festival this year (tickets are available now).

But interestingly the EU’s General Data Protection Regulation may get in there ahead of any setup/network disruption. In the Guardian I saw a piece called Getting your data out of Tinder is really hard – but it shouldn’t be.

Its all about getting data back from Tinder (which remember is part of IAC/Match group)

…Duportail eventually got some of the rest of her data, but only on a voluntary basis, and only after she identified herself as a journalist. Her non-journalist friends who followed suit never got responses to similar requests.

Finally armed with the 800 pages she had clawed back from Tinder, Duportail wrote a story reflecting on her own relationship with her data, and the myopic view Tinder had of her love life. I feel her story helps bridge the chasm between those with information stored in the database and the architects behind it, providing much needed neutral common ground to democratically discuss power distributions in the digital economy.

Given the popularity of her story, and my overflowing inbox, I would say many agree. And indeed, you should expect more similar stories to be unearthed in the future because of the upcoming General Data Protection Regulation (GDPR). From May 2018, the new European-level regulation will come into force, claiming wider applicability – including on US-based companies, such as Tinder, processing the personal data of Europeans – and harmonising data protection and enforcement by “levelling up” protections for all European residents.

I know there is a lot of push back from the big American internet corps, but this is coming and the there is no way they can wriggle out of it?

…beyond the much older right of access, the true revolution of GDPR will come in the form of a new right for all European citizens: the right to portability.

It seems like such a small thing but actually it has the potential to be extremely disruptive. Heck its one of the things I wanted back in early 2011. Imagine all those new services which could act like brokers and enable choice! It could be standard to have the ability to export and import rich data sets like Attention profile markup language (APML).

I just wish we were staying in Europe, although the UK has agreed to take GDPR, thankfully! There was no way, if they were left on their own, this would ever come about; like it looks like it might.