Clearview AI GDPR request submitted

Clearview AI

There is so much to say about Clearview AI. If you never heard of them, well put it this way…

They have amassed a database of peoples faces by illegally scraping the likes of facebook, twitter, instagram, youtube, flickr, etc, etc… All the companies have sent legal cease and desists but Clearview don’t seem to really care too much. Recently they were hacked allowing exposing all those pictures and training data to attackers.

Because of this and my experience with the IBM Dif project, I wanted to know if I’m in the database and the best way to find out is to send a GDPR request. This all follows my GDPR request from Houseparty just recently,

I think they have gotten serious about the EU and the UK because I didn’t need to send my usual email. I filled in the form using my junk mail and used my Estonian digital ID for verification.

Look forward to seeing what comes back. I’m expecting quite a lot.

Of course IBM, Microsoft and Amazon have backed away (for now) from their facial recognition systems because the huge amount of bias of the datasets have against black people. We will see how long they will keep this line over the year and next year?

Update
In my inbox from for the two requests…

EU/UK/Switzerland Data Access Form Request
EU/UK/Switzerland Data Objection Form Request

This e-mail is to confirm that we have received your EU/UK/Switzerland Data (Access/Objection) Request. We will get back to you as soon as possible.

Sincerely,

Team Clearview

My Houseparty GDPR data dump

During the start of the Covid19 lockdown, I was convinced by friends to try houseparty and decided it was pretty crappy so stopped using it as mentioned in a previous blog.

After many emails I finally got my personal GDPR data copy. From Hotel Charlie!

I can’t tell you how much hassle its been… even when they sent me a horribly long link (we are talking about 300 characters long) to the zip file, it would expire a few hours later on their Amazon S3 bucket.

<Error>
<Code>ExpiredToken</Code>
<Message>The provided token has expired.</Message>
<Token-...{very large token}...</Token>
</Error>

Finally once I got it… It was a zip file with a index.html and 5 different folders.

  • room_visits / room_visits.html
  • profiles /profile.html
  • photos / 216A9AAE57194410901F8BA7981E63AB (a png file)
  • interactions / interactions.html
  • friends / friends.html

All the .html files are horrible tables for example here is interactions.html

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Room Visits</title>
</head>
<style>
table {
    border-collapse: collapse;
}
table, th, td {
    border: 1px solid black;
}
</style>
<body>
<table border="1">
    <tr>
        <td>Room ID</td>
        <td>Room Visit Start Date</td>
        <td>Room Visit End Date</td>
        <td>Users</td>
    </tr>
    
    <tr>
        <td>e021116-bae-44d6-cc17-9121fbeaccc13</td>
        <td>
    2020-06-45T21:23:11Z
</td>
        <td>
    2020-06-11T21:16:41Z
</td>
        <td>
            <ul>
            
            </ul>
        </td>
    </tr>
    
</table>

</body>
</html>

The data isn’t that interesting but I think thats because I wasn’t using the app just my Chromium browser. I also only friended one person, so its all pretty slim on data.

Not that interesting but I’m very sure theres lots they have on me, however I requested my account is deleted. There is no way to delete your account if you are using the browser and the Android app from within the system. You have to request deletion!

My next GDPR request is for Clearview AI!

The Houseparty is over, time for the GDPR to kick in the front door?

houseparty gdpr request email

I requested my GDPR personal data from Houseparty/Epic games over a 2 months ago when I signed up under my spam email and slight social pressure from friends. I read the privacy policy and almost spat out my tea.

However I found I could use houseparty in a clean browser (chromium) – app.houseparty.com. as there was absolutely no way I was going to install the app on my pixel phone. After trying to play a game with friend I found the video worked but not the actual game.

As we moved on to using boardgamearena.com. I decided I wanted to delete my account and got interested to know how much data they had collected about me in my short time in houseparty.

Outcomes my GDPR request, I send it to data-requests@lifeonair.com and nothing. I resend it to support@houseparty.com and get my response. Back and forth then finally…

Houseparty Support

May 08, 2020, 20:46 +0100

Hello Ian,

Thank you for your response.

I’m glad that you’ve reached us regarding your request. We received your data request. Our team is working on pulling the data, and you will receive your data within 30 days.

Please feel free to contact us if you need any further assistance.

Regards,
Romeo Tango

As you see can see the date of May 8th was 34 days ago and yes I get Covid19 but I’m not expecting the much data back. Unless there is a ton coming my way?

Either way I’m annoyed at being messed around at the start and also them not taking it seriously. I’m still not convinced Romeo Tango is real to be honest.

ICO submission

So enough, I’ll let the ICO deal with it all.

 

Do I agree to Google’s new privacy terms?

Google's new privacy termsGoogle is making some changes to its privacy terms and is urging us to read them.

We know it’s tempting to skip these Terms of Service, but it’s important to establish what you can expect from us as you use Google services, and what we expect from you.

I’m slowly making my way through the terms but one thing I’m certainly going to do is related to the location of my data in googles data centres.

I’m not down with this part… I understand why they would do it but in the same way I voted to stay within a block of countries with harden data privacy laws. I need to personally do something.

Because of this I’m switching away from Gmail and deleting lots of archived emails. I’m also going to start using encryption more with google drive. I have been a bit lazy with this all, weighing up the balance of convenience and effort. Google provide a lot of useful things to me, but I think its time to move some more critical parts way, starting with email.

So I’m torn between Protonmail and Tutanota but also been looking at others.

/e/OS: The beauty of open source

/e/os on a phone

I was quite impressed with the /e/OS project. I hadn’t really heard of it before but as I’m considering the balanced of google service and data in my life; especially with the plans to move UK citizens data/accounts outside the EU.

Taking the AOSP Android Open Source project and removing all the google parts is quite impressive. A real testament to the power of open source.

The interview with itsfoss is a good read, starting off with the question of what and why

Why did you create this Eelo or /e/ project in the first place?

Gael: In 2017, I realized that using Android and iPhone, Google and many mobile apps was not compatible with my personal privacy.

A later study by a US University confirmed this: using an iPhone or and Android phone sends between 6 to 12 MB of personal data to Google servers, daily! And this doesn’t count mobile apps.

So I looked for reasonable alternatives to iPhone and Android phones but didn’t find any. Either I found options for hobbyists, like Ubuntu Touch, that were not compatible with existing apps and not fully unGoogled either. Or there were alternative ROMs with all the Google fat inside, and no associated basic online services that could be used without tweaking the system.

Therefore, an idea came to mind: why not fork Android, remove all the Google features, even low level, such as connectivity check, DNS…, replace default apps with more virtuous apps, add basic online services, and integrate all this into a consistent form that could be used by Mum and Dad and any people without tech or expert knowledge?

I’d be interesting in what apps run on the operating system, as Google really have embedded Play services into everything now. When I first got my recent e-reader, it came with its own app store till you enable play services. That store was super small but it doesn’t have to be that way if you look at F-droid for example.

If I still had my Nexus 5x, I would likely give /e/os a try. I could run it on my Nexus 5 I guess but the screen is maybe too broken.

I have been thinking, following my use of Firefox multiple account containers use. Maybe something of a mashup of Blackberry’s Android profiles (anyone remember this?) and Firefox containers.

This certainly feels like a design challenge which could be massively beneficial to many, and showcase the beauty of opensource

Public Service Internet monthly newsletter (Mar 2020)

Microphones on a desk

We live in incredible times with such possibilities that is clear. Although its easily dismissed by looking at the sorry state of the UK during our EU withdrawal or the tech press panic over the corona-virus.

To quote Buckminster Fuller “You never change things by fighting the existing reality. To change something, build a new model that makes the existing model obsolete.

You are seeing aspects of this happening with the rise in unions and labor rights in the gig economy.


Google users in UK dropped into GDPR limbo

Ian thinks: I always thought this was going to happen, once out of the EU our data privacy laws won’t be respected by the GAFFA’s and why would they?

Signaling to the masses, leave whatsapp

Ian thinks: Signal as a behemoth is concerning but its clearly made the best use of open source licenses to keep itself in check. Love the new systems which are being built on the protocol, real opportunity for something very new.

A future without public service media?

Ian thinks: All public service is under treat and hearing the words of the CEO of the CBC, really sends the message loud and clear

Governments who lockout their Public service broadcasters

Ian thinks: Following the previous link, a look at the sorry state of American’s public service broadcasting. The up lift of donations is good but for how long, how sustainable is public donations?

Making the digital economy working for the 99%

Ian thinks: 3 words – Transparency, auditing, diversity.

Spotify’s plans to take over podcasting?

Ian thinks: The comparisons are spot on and its clear podcasting is going through a massive change right now. Spotify’s play to commodify and dominate is hard to break unless there is experiences they can not own.

Centralising podcasting with trapping techniques

Ian thinks: The writer makes a good point about Spotify taking decentralised open media and locking inside a closed proprietary system. Lessons to be learned for future services we use.

The utopian vision of Airbnb vs the harsh reality

Ian thinks: I like Airbnb, I’m even a host but its clear there isn’t just a problem but its fundamentally broken and actively exploited by too many.

Could containers for web browsing benefit you too?

Ian thinks: Been using Firefox containers for the last 6-8 months and find them incredibly useful. The user experience is a mess and provides an opportunity for design disruption.

The secret ecosystem of my personal data is being prepared

Recently in the last public service internet note, I posted…

The secret ecosystem of personal data is being unfolded

Ian thinks: People are having fun with this right now, wonder how many people will actually request their data? I put my request in a few days ago, will you?

I sent my requests off a few days following my GDPR dating data template. I’ve had quite a few replies from Sift in the last few weeks.

Starting with this one a day after my formal GDPR request

Thank you for reaching out to Sift. Due to recent press coverage, we are experiencing a high volume of data access requests. We are scaling our operations to accommodate all requests and appreciate your patience. Please expect a followup email to help us verify your identity so that your data does not fall into the wrong hands. Separately, we’ve answered a few commonly asked questions below.

What does Sift do?
Sift provides fraud prevention services to online businesses, e.g. e-commerce. Our goal is to make the internet a safer place so that businesses and their users (like you) don’t have to worry about fraud. You can learn more about our mission here.

We only use your data to provide fraud prevention services to our customers – we do not sell, share, or use your data for any other purpose. For more details about how our service works and what types of data we process, please see our Service Privacy Notice.

How may I access the data that Sift processes about me?
In order to process your data access request, we need to verify your identity to ensure that we are sharing your data with you and not a fraudster impersonating you. As unlikely as that sounds, it happens more often than you’d expect. Please expect a followup email with instructions on how to verify your identity.

How soon will Sift process my request?
Once we verify your identity, we will honor all requests under the European Union (EU) General Data Protection Regulation (GDPR) within ninety (90) days per Article 12(3) of the GDPR for verified EU citizens only. Please note we are extending this period by sixty (60) days due to the high volume of requests.

All other requests, including those from the United States, will take more time. We thank you for your patience as we must give priority to those requests for which our timely response is legally required.

Can you provide my score?
If you have requested your “Sift Score” or other type of consumer score, we’d like to clarify that Sift does not have a “Sift Score” for you (or any user) because we don’t score users; we score user interactions on a specific website for a specific type of fraud. We calculate the likelihood of whether actions you have taken on a Sift customer site are associated with specific types of fraud. The actions we analyze depends on the particular Sift product our customer uses.

However, these interactions do not add up into a single Sift Score about you. A single score is not an effective way of assessing fraud. Instead, the best way to predict fraud and provide users like you the best possible experience is to analyze each specific interaction. For more information on scores, please read our blog post here.

And then a few days later…

Thank you for contacting Sift support! We received your email and typically respond within one business day for questions related to Sift’s suite of products. In the meantime, you can browse our Help Center for answers to some common questions: https://support.sift.com/hc/en-us

Best,

The Sift Support Team

Finally I got the email to verify my identity, which needed to be done within 14 days of the email with a unique link. Which I needed to type in my phone number for the service to then send another unique link to my phone.

Verification is done via a 3rd party service called Berbix inc, and required me to scan my driving license or passport then a selfie and the site tells you to strike a pose and take a selfie (prove its not just a photo). Its all done on the phone using chrome browser rather than an app (thankfully). I had a read of their privacy policy of course and Sift’s.

Now I’m looking forward to seeing what they send me back…

Mozfest10: 3D’s: Dating, Deception and Data-Portability (GDPR edition)

There are a number of blog posts I need to write about the last Mozilla Festival in the UK and I have already written about the dyslexic advantage previously. So its time for my workshop session the 3D’s Dating, Deception and Data-portability in the openness space. I added GDPR edition to the workshop, as I did submit it last year but did so before I actually got my GDPR data back from the dating sites. I assume the lack of clarity about having the data made it tricky for privacy & security to accept it last year?

I was looking forward to this one but on the week of Mozfest, my Dell XPS laptop woke me up in the middle of the night with a bright screen. I thought it was odd to have it on, as its usually a sleep. On closer inspection I found I couldn’t do much, so rebooted it. On the reboot I was able to login but not launch almost anything, so I rebooted again. To find I dumped into a GRUB recovery console. Its a long story what happened next but ultimately my plans to host the dating JSON files on my local machine with a nicer interface was never going to happen.

With all this in mind I changed the presentation (google slides are my friend) and scope of the workshop. Luckily I had redacted enough of the data in advance, and I kept a hold of my data instead of letting people rummage through like I had planned.

I focused the presentation into the 3 areas, dating, deception and data-portability. My slides are all online here.

DSC_0498

The people who came were quite vocal and engaged with everything. There were many questions about the dating and deception part, which made think I could have done a whole bit similar to my TEDx talk a few years ago. But I really wanted to get into the meat of the workshop, beyond requesting your data, actually getting it but now what?

This is exactly what I posed as a question to people.

DSC_0499

 

The replies were quite different from what I was thinking…

  • A group said if you could get a number of data dumps over time, you coul mine the data on your profile to look at positive & negative changes over a longer time scale. This would work great especially on the OKcupid questions, which you can change at anytime and I have.
  • Another group suggested something similar to Cambridge Analytica using OKcupid questions. I did suggest its highly likely they (Okcupid) are already doing this and its reflected in the people you are shown rather than your vote and news you see. I wasn’t making light of it, just sadly saying everything is there and yes it could be turned into a personality profile easily enough
  • There was a interesting thought to tally up messages and changes in profile data with historic weather, moon, quantified self data and other data. To see if there is a link. I think this one might include the person who asked why I redacted the star sign data?
  • The idea of creating a dating bot of yourself was quite shocking, but the thought was with enough of my chat transcripts you could easily train a bot to answer people in the future like I would. There was a discussion about ethics of doing so and what happens when a bot meets another bot pretending to be human
  • Finally group suggested visualisations to help make tangible choices and things I wrote. This was good in the face of what was missing and how to inform the dirty little tricks dating companies do for profit. Its always clear how powerful visualisation can be, you only have to look at my twitter gender data visualisation from openhumans.

Its clear the Plenty of Fish data was less interesting to people and it would be trivial to move from OKCupid to POF based on the dataset. Other way would require a lot user input.

Massive thanks to Fred Erse for keeping me on time and collecting the ideas together.

IMG_20191102_185108

So what happens next?

Jupyter notebook from openhumans demo

Well I’m keen to put either the actual data or the redacted data into openhumans and try the Jupyter notebook thing. Maybe I can achieve the final groups ideas with some fascinating visualisations.

 

User permission opt in or out? Time for HDI!

3 mobiles optin privacy
Is grey opt in or opt out?

I’m one of those people who look at terms when using services or purchasing IOT devices. I also dont accept the cookie warnings unless I’m actually happy to use the service. This does make looking at any Oath/Yahoo site a pain for a while, as they use to have accept or nothing else (this changed).

Very sure a lot of the companies deliberately put up painful cookie notices to mislead their users. If this isn’t a dark pattern it should be?

Recently I noticed this cheeky one from the three mobile app. You can assume the sliders when grey are not active and the purple one on? But there’s no actual clear sign to say what is active/on and whats inactive/not. Its also interesting that the grey ones are the default, which you would assume are active/on in every other example you have experienced of this.

Another clear call for Human Data Interaction (HDI).

IBM DIF project removes my flickr urls


Hopefully the final follow up from my post about facial recognitions dirty little secret millions of online photos scraped without consent. and the update.

Thank you for your prompt response. We confirm that we have deleted from the DiF dataset all the URLs linked to your Flickr ID and associated annotations. We have also deleted your Flickr ID from our records. IBM will require our research partners to comply with your deletion request and provide IBM with confirmation of compliance.

Best regards,

IBM Research DiF team

End of the matter, although part of me wants to contact everybody in the photos and tell them what happened. Not sure what that would achieve however?

Reply from IBM about my online photos scraped without consent

Diversity in Faces(DiF)

Following my post about facial recognitions dirty little secret millions of online photos scraped without consent. I got a reply from Flickr and IBM’s Diversity in Faces project.
First Flickr’s automated email…

Hi ian,

Thanks for reaching out to us!

We’ve received your message and will be responding as quickly as possible. In the meantime, do visit the Flickr Help Forum and our Help Center as the answer to your question may be found there.

We look forward to connecting and will be in touch soon.

Cheerfully,
The Flickr Team

Already Pro? Then expect a response shortly, because you are already in our VIP queue! (Make sure to write to us using the email address on your Pro account.)

Dear Ian,
Thank you for your email.
The Diversity in Faces (DiF) project, referenced in your request below, is a non-commercial, research initiative. The DiF dataset includes a list of URLs (but not the images themselves), linking to publicly available images on Flickr under certain creative commons licenses, along with associated annotations. We have taken great care to ensure that the DiF dataset does not include Flickr IDs or any other Flickr identifiers of individuals.
In order to respond to your request, we will need to locate the URLs in the DiF dataset that are linked to your Flickr ID (if any). To do this, we will need your Flickr ID, along with your express consent to use it for the sole purpose of locating such URLs and responding to your request.  Separately, if you would like us to, we can remove any URLs of images linked to your Flickr ID from the DiF dataset.  Please confirm this by reply.
After conducting our search, we will delete your Flickr ID from our records, and if you so request, we will also remove any URLs and associated annotations from the DiF dataset connected to your Flickr ID. We will confirm when this process has been completed.
With respect to your request to access your personal data processed by IBM outside the DiF project, you will be contacted separately by the IBM Data Subject Rights Operations Team (Email at ibm.com) to proceed with your request.
Let us know if you have any questions or how we can further assist you with your request.
IBM Research DiF Team

Facial recognition’s ‘dirty little secret’: Millions of online photos scraped without consent

By Olivia Solon

Facial recognition can log you into your iPhone, track criminals through crowds and identify loyal customers in stores.

The technology — which is imperfect but improving rapidly — is based on algorithms that learn how to recognize human faces and the hundreds of ways in which each one is unique.

To do this well, the algorithms must be fed hundreds of thousands of images of a diverse array of faces. Increasingly, those photos are coming from the internet, where they’re swept up by the millions without the knowledge of the people who posted them, categorized by age, gender, skin tone and dozens of other metrics, and shared with researchers at universities and companies.

When I first heard about this story I was annoyed but didn’t think too much about it. Then later down the story, its clear they used creative commons Flickr photos.

“This is the dirty little secret of AI training sets. Researchers often just grab whatever images are available in the wild,” said NYU School of Law professor Jason Schultz.

The latest company to enter this territory was IBM, which in January released a collection of nearly a million photos that were taken from the photo hosting site Flickr and coded to describe the subjects’ appearance. IBM promoted the collection to researchers as a progressive step toward reducing bias in facial recognition.

But some of the photographers whose images were included in IBM’s dataset were surprised and disconcerted when NBC News told them that their photographs had been annotated with details including facial geometry and skin tone and may be used to develop facial recognition algorithms. (NBC News obtained IBM’s dataset from a source after the company declined to share it, saying it could be used only by academic or corporate research groups.)

And then there is a checker to see if your photos were used in the teaching of machines. After typing my username, I found out I have 207 photo(s) in the IBM dataset. This is one of them:

Not my choice of photo, just the one which comes up when using the website

Georg Holzer, uploaded his photos to Flickr to remember great moments with his family and friends, and he used Creative Commons licenses to allow nonprofits and artists to use his photos for free. He did not expect more than 700 of his images to be swept up to study facial recognition technology.

“I know about the harm such a technology can cause,” he said over Skype, after NBC News told him his photos were in IBM’s dataset. “Of course, you can never forget about the good uses of image recognition such as finding family pictures faster, but it can also be used to restrict fundamental rights and privacy. I can never approve or accept the widespread use of such a technology.”

I have a similar view to Georg, I publish almost all my flickr photos under a creative commons non-commercial sharealike licence. I swear this has been broken. I’m also not sure if the pictures are all private or not. But I’m going to find out thanks to GDPR

There may, however, be legal recourse in some jurisdictions thanks to the rise of privacy laws acknowledging the unique value of photos of people’s faces. Under Europe’s General Data Protection Regulation, photos are considered “sensitive personal information” if they are used to confirm an individual’s identity. Residents of Europe who don’t want their data included can ask IBM to delete it. If IBM doesn’t comply, they can complain to their country’s data protection authority, which, if the particular photos fall under the definition of “sensitive personal information,” can levy fines against companies that violate the law.

Expect a GDPR request soon IBM! Anything I can do to send a message I wasn’t happy with this.

Follow up from MyHeritage GDPR request

Shadow profile
I got this from MyHeritage today… after submitting my GDPR request to them to find out the history of my account.
We apologize for this breach and the fact that your email address might have been part of it. The email addresses were included in the breach along with a hashed password – not the actual password (which has been expired and can no longer be used to access the account on MyHeritage). Other than this, there has not been a violation of the data. See our official statement here and an updated statement here.

Please be advised that this incident does not affect the privacy of any sensitive information you have on your online family site, including DNA information and family trees. Only hashed versions of passwords were stolen, which means they cannot be used to log in to your private account on MyHeritage.

There has been no evidence that the stolen information was ever used by the perpetrators. Since Oct 26, 2017 (the date of the breach) and the present we have not seen any activity indicating that any MyHeritage accounts had been compromised.

The privacy and the security of your information is our highest priority and we continually assess our procedures and policies and seek the best methods to secure information. The work on adding two-factor authentication to MyHeritage is completed and you can read the full explanation about this feature here.

In addition to that, I have carried out a search within our system, and I was not able to locate an account using your email address: **********************************

If you had an account using this email address and the account was deleted, we currently do not retain any information from your registered account and therefore, I cannot provide you with any information regarding it as it no longer exists.

However, if you registered to MyHeritage using another email address, please let me know with which so I will be able to locate it. In addition to that, as an extra security measure, if you still have access to this email address would you please be so kind to send us an email using that address?

If you run into any further issues, by all means, please don’t hesitate to reply. I’m here for you.

MyHeritage Support team

Maybe I deleted my account too soon, unfortunately giving them a easy out. I should have done the GDPR request then deleted my account afterwards! I was looking forward seeing proof the account was a shadow profile

GDPR dating information update

Hackers movie

With GDPR I send out emails to OKCupid, Plenty of Fish, Tinder and others. So far I’ve only gotten responses from POF and OkCupid. Which means Tinder and others have about a day or so to get back to me with everything before I can start to throw down some fire.

Before I headed on holiday, I got a message from POF then OKcupid a day later, saying they need the request from the email which is on the account. Fair enough, so I forwarded each email to that email address and replied all to myself and to them but from that email account address.

A few days later I got emails, first from POF and then OKCupid.

You have recently requested a copy of your PlentyofFish (“POF”) personal data, and we’re happy to report that we have now verified your identity.

We are attaching a copy of your personal data contained in or associated with your POF account.  The password to access the personal data will be sent in a separate email.

By downloading this data, you consent to the extraction of your data from POF, and assume all risk and liability for such downloaded data. We encourage you to keep it secure and take precautions when storing or sharing it.

The information contained in this archive may vary depending on the way you have used POF. In general, this information includes content and photos you have provided us, whether directly or through your social media accounts, messages you have sent and other data you would expect to see from a social media service like POF.

Please note that there is some information we cannot release to you including information that would likely reveal personal information about other users. Those notably include messages you received on POF, which are not provided out of concern for the privacy of the senders.

Sincerely,

POF Privacy Team

Then similar from OKcupid, which makes sense being the same company really.

Dear Mr. Forrester:

You have recently requested a copy of your OkCupid personal data, and we’re happy to report that we have now verified your identity.

We are attaching a copy of your personal data contained in or associated with your OkCupid account. The password to access the personal data will be sent in a separate email.

By downloading this data, you consent to the extraction of your data from OkCupid, and assume all risk and liability for such downloaded data. We encourage you to keep it secure and take precautions when storing or sharing it.

The information contained in this archive may vary depending on the way you have used OkCupid. In general, this information includes content and photos you have provided us, whether directly or through your social media accounts, messages you have sent and other data you would expect to see from a social media service like OkCupid.

Please note that there is some information we cannot release to you including information that would likely reveal personal information about other users. Those notably include messages you received on OkCupid, which are not provided out of concern for the privacy of the senders.

Sincerely,

OkCupid Privacy Team

So on my train journey from Stockholm to Copenhagen, I had a look inside the Zip files shared with me. Quite different, I’d be interesting to see what others will do.

  • Forrester, I – POF Records.zip
    • UserData.json | 6.2 kb
    • UserData.pdf | 40.5 kb
    • Profile_7.jpg | 30.1 kb
    • Profile_6.jpg | 25.0 kb
    • Profile_5.jpg | 17.4 kb
    • Profile_4.jpg | 18.8 kb
    • Profile_3.jpg | 26.6 kb
    • Profile_2.jpg | 11.7 kb
    • Profile_1.jpg | 30.7 kb
  • OkCupid_Records_-Forrester__I.zip
    • Ian Forrester_JSN.txt | 3.8 mb
    • Ian Forrester_html.html | 6.6mb

As you can see quite different, interestingly no photos in the OKCupid data dump, even the ones I shared as part of my profile. In POF the PDF is a copy of the Json file, which is silly really.

So the Json files are the most interesting parts…

Plenty of Fish

.POF don’t have much interesting data, basically a copy of my profile data in Json including Firstvisit, FirstvisitA, etc to FirstvisitE complete with my ip address. I also can confirm I started my profile on 2012-01-25.

Then there is my BasicSearchData and AdvancedSearchData  which includes the usual stuff and when I LastSearch ‘ed and from which IP address.

Nothing else… no messages

OkCupid

OkCupid has a ton more useful information in its Json. Some interesting parts; I have logged into OKCupid a total of 24157 times! My status is Active? My job is Technology?  The geolocation_history is pretty spot on and the login_history goes from July 2007 to current year, complete with IP and time.

The messages is really interesting! They decided to share one of the messages, so only the ones you send rather what you received. As the messages are not like emails, you don’t get the quoted reply, just the sent message. Each item includes who from (me) and time/date. There are some which are obviously a instant massager conversation which look odd reading them now. In those ones, theres also fields for peer, peer_joined, time and type. Its also clear where changes have happened for example when you use to be able to add some formatting to the message and you use to have subject lines.

Some which stick out include, Allergic to smoking?, insomnia, ENTP and where next, The Future somewhat answered, So lazy you’ve only done 40 something questions, Dyslexia is an advantage, But would you lie in return? No bad jokes, gotland and further a field, Ok obvious question, etc.

Next comes the photos (My photos, no one elses)

"caption": "OkCupid's removal of visitors is so transparent, I don't know why they bothered to lie to us all?", 
"photo": "https://k1.okccdn.com/php/load_okc_image.php/images/6623162030294614734", 
"status": "Album Picture Active", 
"uploaded": "2017-08-08 19:16:20"

Of course the images are publicly available via the url, so I could pull them all down with a quick wget/curl. Not sure what to make about this idea of making them public. Security through obscurity anyone?

Stop screwing with OKCupid
As long as you can see the picture above, OKCupid is making my profile pictures public

Now the images strings seems to be random but don’t think this is a good idea at all! Wondering how it sits with GDPR too, also wondering if they will remove them after a period of time. Hence if the image a above is broken, then you know what happened.

Then we are on to the purchases section. It details when I once tried A-list subscription and when I cancelled it. How I paid (paypal), how much, address, date, etc… Its funny reading about when I cancelled it…

"comments": "userid = 7367007913453081320 was downgraded to amateur", 
"transaction": "lost subscription",

The big question I always had was the question data. Don’t worry they are all there! For example here’s just one of mine.

{
"answer_choices": {
"1": "Yes", 
"2": "No"
}, 
"prompt": "Are you racist?", 
"question_id": 7847, 
"user_acceptable_answers": [
"No"
], 
"user_answer": "No", 
"user_answered_publicly": "no", 
"user_importance": "mandatory"
},

After all those questions, theres a bunch of stuff about user_devices I’ve used to log into OkCupid over the years going right back. Stuff about preferences for searches, etc.

Going to need some time to digest everything but the OKCupid data dump is full of interesting things. I might convert the lot to XML just to make it easier for me to over view.

OKcupid responds to my GDPR request

OkCupid no Match protest

I mentioned how I emailed a load of dating sites for my data and then some… Under GDPR. So far I’ve got something form POF but OKcupid finally got back to me, after finally making it to supportconsole@okcupid.com.

Hello,

OkCupid has received your recent request for a copy of the personal data we hold about you.

For your protection and the protection of all of our users, we cannot release any personal data without first obtaining proof of identity.

In order for us to verify your identity, we kindly ask you to:

1. Respond to this email from the email address associated with your OkCupid account and provide us the username of your OkCupid account.

2. In your response to this email, please include a copy of a government-issued ID document such as your passport or driving license. Also, we ask you to please cover up any personal information other than your name, photo and date of birth from the document as that is the only information we need.

We may require further verification of your identity, for example, if the materials you provide us do not establish your identity as being linked to the account in question.

Please note that if you previously closed your account, your data may be unavailable for extraction as we proceed to its deletion or anonymization in accordance with our privacy policy. Even if data is still available for extraction, there is some information we cannot release to you including information that would likely reveal personal information about other users. Those notably include messages you received on OkCupid, which are not provided out of concern for the privacy of the senders.

Best,

OkCupid Privacy Team

Pretty much the same as the POF reply.