Mozfest10: 3D’s: Dating, Deception and Data-Portability (GDPR edition)

3D’s: Dating, Deception and Data Portability | Mozfest 2019 from Ian Forrester

There are a number of blog posts I need to write about the last Mozilla Festival in the UK and I have already written about the dyslexic advantage previously. So its time for my workshop session the 3D’s Dating, Deception and Data-portability in the openness space. I added GDPR edition to the workshop, as I did submit it last year but did so before I actually got my GDPR data back from the dating sites. I assume the lack of clarity about having the data made it tricky for privacy & security to accept it last year?

I was looking forward to this one but on the week of Mozfest, my Dell XPS laptop woke me up in the middle of the night with a bright screen. I thought it was odd to have it on, as its usually a sleep. On closer inspection I found I couldn’t do much, so rebooted it. On the reboot I was able to login but not launch almost anything, so I rebooted again. To find I dumped into a GRUB recovery console. Its a long story what happened next but ultimately my plans to host the dating JSON files on my local machine with a nicer interface was never going to happen.

With all this in mind I changed the presentation (google slides are my friend) and scope of the workshop. Luckily I had redacted enough of the data in advance, and I kept a hold of my data instead of letting people rummage through like I had planned.

I focused the presentation into the 3 areas, dating, deception and data-portability. My slides are all online here.


The people who came were quite vocal and engaged with everything. There were many questions about the dating and deception part, which made think I could have done a whole bit similar to my TEDx talk a few years ago. But I really wanted to get into the meat of the workshop, beyond requesting your data, actually getting it but now what?

This is exactly what I posed as a question to people.



The replies were quite different from what I was thinking…

  • A group said if you could get a number of data dumps over time, you coul mine the data on your profile to look at positive & negative changes over a longer time scale. This would work great especially on the OKcupid questions, which you can change at anytime and I have.
  • Another group suggested something similar to Cambridge Analytica using OKcupid questions. I did suggest its highly likely they (Okcupid) are already doing this and its reflected in the people you are shown rather than your vote and news you see. I wasn’t making light of it, just sadly saying everything is there and yes it could be turned into a personality profile easily enough
  • There was a interesting thought to tally up messages and changes in profile data with historic weather, moon, quantified self data and other data. To see if there is a link. I think this one might include the person who asked why I redacted the star sign data?
  • The idea of creating a dating bot of yourself was quite shocking, but the thought was with enough of my chat transcripts you could easily train a bot to answer people in the future like I would. There was a discussion about ethics of doing so and what happens when a bot meets another bot pretending to be human
  • Finally group suggested visualisations to help make tangible choices and things I wrote. This was good in the face of what was missing and how to inform the dirty little tricks dating companies do for profit. Its always clear how powerful visualisation can be, you only have to look at my twitter gender data visualisation from openhumans.

Its clear the Plenty of Fish data was less interesting to people and it would be trivial to move from OKCupid to POF based on the dataset. Other way would require a lot user input.

Massive thanks to Fred Erse for keeping me on time and collecting the ideas together.


So what happens next?

Jupyter notebook from openhumans demo

Well I’m keen to put either the actual data or the redacted data into openhumans and try the Jupyter notebook thing. Maybe I can achieve the final groups ideas with some fascinating visualisations.


Gender diversity on twitter?

Results of who I follow on twitter

I rarely read twitter due to the API changes which I’ve talked about in the past. But I saw Teknoteacher talking about changing his followers after reading about Male tech CEOs follower accounts. I thought I’d share some things I discovered too. Especially reading this a while back.

So my results are above, using the online tool –

But a while ago I used Open Human’s twitter archive analyzer by Bastian Greshake Tzovaras. It was super sobering!

Here is my replies by gender from when I first started using Twitter back in 2017. As you can see there was a massive spike of conversation with males in 2012, I also generally talk to more men than women on twitter.

My replies & gender Likewise when retweeting based on gender its mainly males. Recently its a lot closer to 50% which is great but I wonder with my lack of twitter use, how that will effect things? (I have requested a new update of my twitter data)

My retweets & genderOf course my instant thought is there is noise in the figures as its not always clear if people are male or female for many reasons. But its disappointing to read Elon Musk’s tweet.

And read about others such as…

Sundar Pichai, the CEO of Google, follows 267 accounts on Twitter. Of those, 238 appear to be men. He follows nearly as many Twitter Eggs (15) as women (21).

Satya Nadella, Microsoft CEO, followed the most women (39) of any of the accounts examined by the Guardian, though that is still half the number of men he follows (78) out of a total of 165 accounts.

I’d really like to see this applied to race not just gender too. It reminds me how I was going to learn more Python so I can create this as a Juno personal notebook in Open Humans.


I updated Open Humans with my latest Twitter data export and here are the results.
Once again very sobering to see. Got to make some changes.

Screenshot of replies for 2019

Worth adding from TwArχiv site.

The graph shows you the number of replies to Twitter users that are classified as either male or female. The classifications are predictions based on users’ first names as given in their Twitter accounts. The predictions itself are performed by the Python package gender_guesser . It uses name/gender-frequencies from a larger text corpus. mostly male, mostly female, andy and unknown classifications are ignored. To decrease the noise the daily values have been averaged by a daily average over a 180 day window (dataframe.rolling('180d').mean()).

Ideally these graphs would include non-binary folks. Doing this is a bit trickier. It is thus a work in progress.

Screenshot of retweets for 2019Also worth mentioned…

Even more interesting than whether replying to people might be gendered can be the question which voices are being amplified . On Twitter a good indicator of amplification are retweets. These can be gender balanced or show biases, similarly to the replies to other users.

The graph shows you the number of retweets to Twitter users that are classified as either male or female. The classifications are again predictions made by the Python package gender_guesser . To decrease the noise the daily values have again been averaged by a daily average over a 180 day window (dataframe.rolling('180d').mean()).

Ideally these graphs would include non-binary folks. Doing this is a bit trickier. It is thus a work in progress.