dataportability – Cubicgarden.com…

Facebook removed iCal feeds quietly

The other day a good friend mentioned a birthday invite to me. I was confused, I knew it was their birthday but it wasn’t clear what they were talking about as I asked what they were going to do in UK lockdown number 3.

I looked into what happened and realised the new Facebook redesign removed ical feeds. This was further backed up by google support and Reddit. Worst still the facebook help page doesn’t actually exist any more.

The key seems to be getting access to the old facebook page, which I have tried and tried to get access to, but can not get anymore.

This means all my friends who invited me to events and all my volleyball events will no longer appear on my calendar, unless I see it then export it as a ical. I always thought of Facebook as a massive walled garden but this is getting stupid.

The reasons to be on Facebook just got a lot closer to zero!

Mastodon shows, social data-portability the way it should be

https://mas.to/@cubicgarden/104231400255269868

I recently moved from Mastodon.cloud to Mas.to. My main reason was for some (political?) reason mastodon.cloud was no longer accepted as part of the crossposter accepted domains.

I looked into hosting my own crossposter but decided there might be a reason why mastodon.cloud isn’t supported anymore? I had a look around the fediverse and read quite a few of the terms for each instance/domain. Mas.to looked good.

The actual moving was easily done as per the instructions here.

So easy and quick, if only it was always like this…?

The mixcloud metadata take down

I mentioned a while ago how I was slowly migrating away from Mixcloud as their business model is starting to impinge on people listening to my mixes and I’m not so keen on that. I already mentioned trying to get Funkwhale working and using cue files.

While looking through my mixes I noticed I really didn’t do a great job with some of the metadata. While I had spent most of my time adding to the metadata to Mixcloud (not ideal)

In Mixcloud (once logged in) there are some human friendly urls which I was able to grab images from. The key one being the upload edit page – https://www.mixcloud.com/upload/{username}/{mixname}/edit/ for example https://www.mixcloud.com/upload/cubicgarden/follow-me-into-the-fading-moonlight/edit/

Follow me into the moonlight edit page

My plan was to manually copy the times into my newly written cue files but while talking to Jon about it, he said give him 5mins and he could knock up a script to pull the values out of the HTML page. I thought about it before but using XSLT, however noticed there is a lot javascript rendering making things difficult.

Jon’s quick script written was just what I needed.

#!/usr/bin/env python3

import csv
import sys
from collections import namedtuple
from typing import List

import bs4
from bs4 import Tag

SongInfo = namedtuple('SongInfo', ['number', 'artist', 'title', 'time'])

def load_html(filename: str):
    with open(filename, 'r', encoding='utf-8') as fo:
        return ''.join(fo.readlines())

def extract_song_info(song: Tag):
    try:
        number = song.find(class_='section-number').text
        artist = song.find(class_='section-artist').text
        title = song.find(class_='section-title').text
        time = song.find(class_='section-time')['value']
        result = SongInfo(number, artist, title, time)
        print(f'Extracted {result}')
        return result
    except AttributeError:
        print(f'Error with item {song}')
        return None

def parse_table(input_html: str):
    soup = bs4.BeautifulSoup(input_html, features="html5lib")
    songs = [row for row in soup.find_all(class_="section-row")]
    return [x for x in [extract_song_info(song) for song in songs] if x is not None]


def save_to_csv(file_name: str, songs: List[SongInfo]):
    with open(file_name, 'w', encoding='utf-8') as fo:
        writer = csv.writer(fo)
        for song in songs:
            writer.writerow(song)


if __name__=="__main__":
    if len(sys.argv) != 3:
        print('Usage: extractor.py [input_html_file] [output_csv_file]')
    html = load_html(sys.argv[1])
    songs = parse_table(html)
    save_to_csv(sys.argv[2], songs)
    print(f'Saved to {sys.argv[2]} successfully - Done!')

With it and the HTML pages, which I almost got with Chromedriver, again thanks to Jon, but I couldn’t be bothered to sort out the cookies, etc. I quickly wrote a quick /dirty bash script and fired up a terminal.

#!/bin/bash
./extractor.py $1.html $1.csv
# Verify
echo Details for $1

I thought about modifying Jon’s script to generate the cue files directly bypassing the csv file but decided I should just get them all done. Because I still need to get funkwhale going.

I did notice the edit page doesn’t include genre or the year of the mix, but I can live with that, for now… Scraping web pages is certainly a throw back but its better solution that what I originally was thinking.

This will teach me to sort out my own house of data!

Mozfest10: 3D’s: Dating, Deception and Data-Portability (GDPR edition)

3D’s: Dating, Deception and Data Portability | Mozfest 2019 from Ian Forrester

There are a number of blog posts I need to write about the last Mozilla Festival in the UK and I have already written about the dyslexic advantage previously. So its time for my workshop session the 3D’s Dating, Deception and Data-portability in the openness space. I added GDPR edition to the workshop, as I did submit it last year but did so before I actually got my GDPR data back from the dating sites. I assume the lack of clarity about having the data made it tricky for privacy & security to accept it last year?

I was looking forward to this one but on the week of Mozfest, my Dell XPS laptop woke me up in the middle of the night with a bright screen. I thought it was odd to have it on, as its usually a sleep. On closer inspection I found I couldn’t do much, so rebooted it. On the reboot I was able to login but not launch almost anything, so I rebooted again. To find I dumped into a GRUB recovery console. Its a long story what happened next but ultimately my plans to host the dating JSON files on my local machine with a nicer interface was never going to happen.

With all this in mind I changed the presentation (google slides are my friend) and scope of the workshop. Luckily I had redacted enough of the data in advance, and I kept a hold of my data instead of letting people rummage through like I had planned.

I focused the presentation into the 3 areas, dating, deception and data-portability. My slides are all online here.

The people who came were quite vocal and engaged with everything. There were many questions about the dating and deception part, which made think I could have done a whole bit similar to my TEDx talk a few years ago. But I really wanted to get into the meat of the workshop, beyond requesting your data, actually getting it but now what?

This is exactly what I posed as a question to people.

The replies were quite different from what I was thinking…

A group said if you could get a number of data dumps over time, you coul mine the data on your profile to look at positive & negative changes over a longer time scale. This would work great especially on the OKcupid questions, which you can change at anytime and I have.
Another group suggested something similar to Cambridge Analytica using OKcupid questions. I did suggest its highly likely they (Okcupid) are already doing this and its reflected in the people you are shown rather than your vote and news you see. I wasn’t making light of it, just sadly saying everything is there and yes it could be turned into a personality profile easily enough
There was a interesting thought to tally up messages and changes in profile data with historic weather, moon, quantified self data and other data. To see if there is a link. I think this one might include the person who asked why I redacted the star sign data?
The idea of creating a dating bot of yourself was quite shocking, but the thought was with enough of my chat transcripts you could easily train a bot to answer people in the future like I would. There was a discussion about ethics of doing so and what happens when a bot meets another bot pretending to be human
Finally group suggested visualisations to help make tangible choices and things I wrote. This was good in the face of what was missing and how to inform the dirty little tricks dating companies do for profit. Its always clear how powerful visualisation can be, you only have to look at my twitter gender data visualisation from openhumans.

Its clear the Plenty of Fish data was less interesting to people and it would be trivial to move from OKCupid to POF based on the dataset. Other way would require a lot user input.

Massive thanks to Fred Erse for keeping me on time and collecting the ideas together.

IMG_20191102_185108

So what happens next?

Jupyter notebook from openhumans demo

Well I’m keen to put either the actual data or the redacted data into openhumans and try the Jupyter notebook thing. Maybe I can achieve the final groups ideas with some fascinating visualisations.

Interviewed by PyDataMcr for their podcast about data in dating

I had the pleasure of talking to PyData Manchester better known as PydataMCR.

Next episode of the PyDataMCR podcast is out. Listen to Episode 5 – Data in Dating Ft. Ian Forrester (@cubicgarden) https://t.co/BsVw6mdFaD @Manc_ML

— PyData Manchester (@PyDataMCR) June 28, 2019

They post their podcasts to Anchor.fm oddly enough but post it elsewhere too, so its take your pick. There is a RSS feed too which was tricky to find at first for us old skool podcasters.

The interview was nice but if you heard me talk about online dating data before you may have heard a lot of it before. It was noticeable how things move in the dating world, should do some more research really.

At the end there is a shout out to a woman who has been an inspiration for me. Jeni Tennison the CEO of the Open Data Insitute. I wasn’t sure if Jeni was the only woman on a wrox book cover ever. Although I did notice both genders on the C# 2005 programmers reference book and Beginning XHTML. Even saw multiple races on Professional Multicore Programming: Design and Implementation. Then I finally found Beginning Java Objects: From Concepts to Code by Jacquie Barker

So I take back, although I was never so sure..

Thanks to PydataMCR for the interview and my chaotic schedule which caused many issues. Remember to subscribe to the podcast here.

Over 15 years of Flickr data

Its been a long haul but finally Flickr is beyond use for me. I briefly tried Flickr pro for a while but theres so many other options now. Its a shame but Flickr went through a lot of trouble at the end but was saved from Yahoo craziness by snugmug. But even looking at the pro account prices, I decided that after…

2M views
3.2K tags
8.6K geotags
468 faves
50 groups
59 gigs of photos

It was time to leave Flickr and just let it start deleting my photos, which I mainly had backed up in multiple places anyway.

I was quite impressed with Flickr’s data portability option, for example the uploaded files are exactly the same. But it would have been great if they embedded the tags into the original EXIF data. However it seems they kept the tags in account data. So with some work, it would be possible to pull the whole lot together again? I’m actually surprised no ones already done this?

Digital licence woes and problems ripped large

https://www.flickr.com/photos/will-lion/2593488374/i

Digital licensing and ownership has been discussed in the past a lot, back then it was therotical. But its interesting to revisit the discussion in more modern times with the new ecosystems which have become common place.

Ok fair enough it’s from Torrentfreak but still interesting a read.

The digital world has made it much easier to buy and consume entertainment.

Whether it’s a movie, music track, or book, a shiny “buy now” button is usually just a few keystrokes away.

Millions of people have now replaced their physical media collections for digital ones, often stored in the cloud. While that can be rather convenient, it comes with restrictions that are unheard of offline.

This is best illustrated by an analogy I read a few years ago in a research paper by Aaron Perzanowski and Chris Jay Hoofnagle, titled: “What We Buy When We Buy Now.”

Data-portability and the data transfer project?

Its over 14 years since the dataportability project was founded by a bunch of well meaning people including myself. It was a challenging time with vendor lock, walled gardens and social guilt trips; to be honest little changed till very recently with GDPR.

Data export was good but user controlled data transfer is something special and one of the dreams of the data portability project. Service to service; not because there was a special agreement setup between the services but because you choose to move of your own freewill; makes so much sense.

This why I was kind of sceptical of the Google data transfer project. But on deeper look its pretty good.

In 2007, a small group of engineers in our Chicago office formed the Data Liberation Front, a team that believed consumers should have better tools to put their data where they want, when they want, and even move it to a different service. This idea, called “data portability,” gives people greater control of their information, and pushes us to develop great products because we know they can pack up and leave at any time.

In 2011, we launched Takeout, a new way for Google users to download or transfer a copy of the data they store or create in a variety of industry-standard formats. Since then, we’ve continued to invest in Takeout—we now call it Download Your Data—and today, our users can download a machine-readable copy of the data they have stored in 50+ Google products, with more on the way.

Now, we’re taking our commitment to portability a step further. In tandem with Microsoft, Twitter, and Facebook we’re announcing the Data Transfer Project, an open source initiative dedicated to developing tools that will enable consumers to transfer their data directly from one service to another, without needing to download and re-upload it. Download Your Data users can already do this; they can transfer their information directly to their Dropbox, Box, MS OneDrive, and Google Drive accounts today. With this project, the development of which we mentioned in our blog post about preparations for the GDPR, we’re looking forward to working with companies across the industry to bring this type of functionality to individuals across the web.

All sounds great and the code is open source on Github for anyone to try out. The paper is worth reading too.

However! The devil is in the data or rather the lack of it. As the EFF point out theres no tracking data exchange, the real crown jewels. The transfer tool is good but if the services don’t even share the data, then whats the point?

Dataportability and Dock.io

You may have gotten an invite to dock.io which is a service which reminds me of the late atomkeep;

Atomkeep helps users sync their profile information on social networks, job boards and other Internet sites. Users gain a streamlined way to validate and control their social identity across multiple sites.

The nice thing about the Dock.io is they are doing things more correctly. The potential of blockchain is being talked about everywhere but its great to have these services showing the actual potential.

I always found Atomkeep interesting but found it heavy on the trust and apis. Dock.io benefits from dataportability and GDPR, as I was able to get my Linkedin data dump and drop it in dock.io. Export and import, now thats good! Dock.io reminds me of openhumans as you can have applications which run on top of the protocol which then talks to the actual data.

So far so good, sure to write more about it soon including the use of Ethereum and IPFS.

OKcupid responds to my GDPR request

I mentioned how I emailed a load of dating sites for my data and then some… Under GDPR. So far I’ve got something form POF but OKcupid finally got back to me, after finally making it to supportconsole@okcupid.com.

Hello,

OkCupid has received your recent request for a copy of the personal data we hold about you.

For your protection and the protection of all of our users, we cannot release any personal data without first obtaining proof of identity.

In order for us to verify your identity, we kindly ask you to:

1. Respond to this email from the email address associated with your OkCupid account and provide us the username of your OkCupid account.

2. In your response to this email, please include a copy of a government-issued ID document such as your passport or driving license. Also, we ask you to please cover up any personal information other than your name, photo and date of birth from the document as that is the only information we need.

We may require further verification of your identity, for example, if the materials you provide us do not establish your identity as being linked to the account in question.

Please note that if you previously closed your account, your data may be unavailable for extraction as we proceed to its deletion or anonymization in accordance with our privacy policy. Even if data is still available for extraction, there is some information we cannot release to you including information that would likely reveal personal information about other users. Those notably include messages you received on OkCupid, which are not provided out of concern for the privacy of the senders.

Best,

OkCupid Privacy Team

Pretty much the same as the POF reply.