GDPR dating information update

Hackers movie

With GDPR I send out emails to OKCupid, Plenty of Fish, Tinder and others. So far I’ve only gotten responses from POF and OkCupid. Which means Tinder and others have about a day or so to get back to me with everything before I can start to throw down some fire.

Before I headed on holiday, I got a message from POF then OKcupid a day later, saying they need the request from the email which is on the account. Fair enough, so I forwarded each email to that email address and replied all to myself and to them but from that email account address.

A few days later I got emails, first from POF and then OKCupid.

You have recently requested a copy of your PlentyofFish (“POF”) personal data, and we’re happy to report that we have now verified your identity.

We are attaching a copy of your personal data contained in or associated with your POF account.  The password to access the personal data will be sent in a separate email.

By downloading this data, you consent to the extraction of your data from POF, and assume all risk and liability for such downloaded data. We encourage you to keep it secure and take precautions when storing or sharing it.

The information contained in this archive may vary depending on the way you have used POF. In general, this information includes content and photos you have provided us, whether directly or through your social media accounts, messages you have sent and other data you would expect to see from a social media service like POF.

Please note that there is some information we cannot release to you including information that would likely reveal personal information about other users. Those notably include messages you received on POF, which are not provided out of concern for the privacy of the senders.

Sincerely,

POF Privacy Team

Then similar from OKcupid, which makes sense being the same company really.

Dear Mr. Forrester:

You have recently requested a copy of your OkCupid personal data, and we’re happy to report that we have now verified your identity.

We are attaching a copy of your personal data contained in or associated with your OkCupid account. The password to access the personal data will be sent in a separate email.

By downloading this data, you consent to the extraction of your data from OkCupid, and assume all risk and liability for such downloaded data. We encourage you to keep it secure and take precautions when storing or sharing it.

The information contained in this archive may vary depending on the way you have used OkCupid. In general, this information includes content and photos you have provided us, whether directly or through your social media accounts, messages you have sent and other data you would expect to see from a social media service like OkCupid.

Please note that there is some information we cannot release to you including information that would likely reveal personal information about other users. Those notably include messages you received on OkCupid, which are not provided out of concern for the privacy of the senders.

Sincerely,

OkCupid Privacy Team

So on my train journey from Stockholm to Copenhagen, I had a look inside the Zip files shared with me. Quite different, I’d be interesting to see what others will do.

  • Forrester, I – POF Records.zip
    • UserData.json | 6.2 kb
    • UserData.pdf | 40.5 kb
    • Profile_7.jpg | 30.1 kb
    • Profile_6.jpg | 25.0 kb
    • Profile_5.jpg | 17.4 kb
    • Profile_4.jpg | 18.8 kb
    • Profile_3.jpg | 26.6 kb
    • Profile_2.jpg | 11.7 kb
    • Profile_1.jpg | 30.7 kb
  • OkCupid_Records_-Forrester__I.zip
    • Ian Forrester_JSN.txt | 3.8 mb
    • Ian Forrester_html.html | 6.6mb

As you can see quite different, interestingly no photos in the OKCupid data dump, even the ones I shared as part of my profile. In POF the PDF is a copy of the Json file, which is silly really.

So the Json files are the most interesting parts…

Plenty of Fish

.POF don’t have much interesting data, basically a copy of my profile data in Json including Firstvisit, FirstvisitA, etc to FirstvisitE complete with my ip address. I also can confirm I started my profile on 2012-01-25.

Then there is my BasicSearchData and AdvancedSearchData  which includes the usual stuff and when I LastSearch ‘ed and from which IP address.

Nothing else… no messages

OkCupid

OkCupid has a ton more useful information in its Json. Some interesting parts; I have logged into OKCupid a total of 24157 times! My status is Active? My job is Technology?  The geolocation_history is pretty spot on and the login_history goes from July 2007 to current year, complete with IP and time.

The messages is really interesting! They decided to share one of the messages, so only the ones you send rather what you received. As the messages are not like emails, you don’t get the quoted reply, just the sent message. Each item includes who from (me) and time/date. There are some which are obviously a instant massager conversation which look odd reading them now. In those ones, theres also fields for peer, peer_joined, time and type. Its also clear where changes have happened for example when you use to be able to add some formatting to the message and you use to have subject lines.

Some which stick out include, Allergic to smoking?, insomnia, ENTP and where next, The Future somewhat answered, So lazy you’ve only done 40 something questions, Dyslexia is an advantage, But would you lie in return? No bad jokes, gotland and further a field, Ok obvious question, etc.

Next comes the photos (My photos, no one elses)

"caption": "OkCupid's removal of visitors is so transparent, I don't know why they bothered to lie to us all?", 
"photo": "https://k1.okccdn.com/php/load_okc_image.php/images/6623162030294614734", 
"status": "Album Picture Active", 
"uploaded": "2017-08-08 19:16:20"

Of course the images are publicly available via the url, so I could pull them all down with a quick wget/curl. Not sure what to make about this idea of making them public. Security through obscurity anyone?

Stop screwing with OKCupid
As long as you can see the picture above, OKCupid is making my profile pictures public

Now the images strings seems to be random but don’t think this is a good idea at all! Wondering how it sits with GDPR too, also wondering if they will remove them after a period of time. Hence if the image a above is broken, then you know what happened.

Then we are on to the purchases section. It details when I once tried A-list subscription and when I cancelled it. How I paid (paypal), how much, address, date, etc… Its funny reading about when I cancelled it…

"comments": "userid = 7367007913453081320 was downgraded to amateur", 
"transaction": "lost subscription",

The big question I always had was the question data. Don’t worry they are all there! For example here’s just one of mine.

{
"answer_choices": {
"1": "Yes", 
"2": "No"
}, 
"prompt": "Are you racist?", 
"question_id": 7847, 
"user_acceptable_answers": [
"No"
], 
"user_answer": "No", 
"user_answered_publicly": "no", 
"user_importance": "mandatory"
},

After all those questions, theres a bunch of stuff about user_devices I’ve used to log into OkCupid over the years going right back. Stuff about preferences for searches, etc.

Going to need some time to digest everything but the OKCupid data dump is full of interesting things. I might convert the lot to XML just to make it easier for me to over view.

Why no twitter import?

I said on Techgrumps 66 recently that I found it very strange that there was no Twitter dump import tools/services?

Now I have my twitter data/dump, I want to host it myself which is pretty easy because there is JSON and HTML (already thought about transforming it into XML for easier transformation in the future) but I don’t understand why Tent.io, Status.net, Facebook or heck Google don’t have an option to import your twitter data?

When Twitter made the data available to you, it was clearly yours. Aka you own the rights to all of it. So you can do what you like with it now. Including handing it over to someone else to mine and use if you so wish.

It was something we talked about again while at the dataportability group, data providers who would handle, mine and make sense of your data on your behalf.

I would consider allowing Google mine the data, if it improved my already great Google Now experience.

I would also like to see my Tent.io allow me to upload all my tweets, therefore making it a heck lot more interesting and useful.

Update… Imran reminds me Timehop supports your twitter data, actually I’ve been looking at singly.com recently…

Microblogging dataportability at last?

Twitter data dump

Finally got the ability to download my tweets… Over 6 years of tweets in 6.8 meg of files.

It comes in a zip file not a tar file which is interesting because Facebook uses Tars for its data dumps. Structures interesting because its less of a dump and more a formal backup of your data complete with HTML file bring it all together. Theres a README.txt file which reads…

# How to use your Twitter archive data
The simplest way to use your Twitter archive data is through the archive browser interface provided in this file. Just double-click `index.html` from the root folder and you can browse your entire history of Tweets from inside your browser.

In the `data` folder, your Twitter archive is present in two formats: JSON and CSV exports by month and year.

  • CSV is a generic format that can be imported into many data tools, spreadsheet applications, or consumed simply using a programming language.
  • ## JSON for Developers
  • The JSON export contains a full representation of your Tweets as returned by v1.1 of the Twitter API. See https://dev.twitter.com/docs/api/1.1 for more information.
  • The JSON export is also used to power the archive browser interface (index.html).
  • To consume the export in a generic JSON parser in any language, strip the first and last lines of each file.

To provide feedback, ask questions, or share ideas with other Twitter developers, join the discussion forums on https://dev.twitter.com.

Most of the data is JSON which bugs me a little only because I would personally have to transform it all to XML but alas I’m sure everyone loves it. The CSV spreadsheets are odd and could do with being XML instead of CSV but once again sure its useful to someone out there. The nice thing is there is tons of meta around each microblog/tweet including the geo-location, time and device/client. Even the URLs have some interesting things around it, because I was wondering how they were going to deal with shorten urls, retweets and mentions…

 “urls” : [ {
“indices” : [ 69, 89 ],
“url” : “http://t.co/GSzy55vc”,
“expanded_url” : “http://epicwerewolf.eventbrite.com/”,
“display_url” : “epicwerewolf.eventbrite.com”
} ]

Doesn’t always work… specially when using urls shortener which don’t keep the url after a certain time period. Interesting internally twitter always uses its own t.co for everything…

Right now I’m just interested in the period around my brush with death… Real shame theres no references to mentions you’ve had, as I would have loved to have seen some of those. Guess Twitter were not going to delve into that can of worms…

I want to know why theres no status.net inporter?

Cnet have a overview of how and what to do with the archive. Thanks Matt