The Great Hack (2019) looks interesting.
The Cambridge Analytica scandal is examined through the roles of several affected persons.
Weirdly by a company also involved in surveillance capitalism?
Thoughts and ideas of a dyslexic designer/developer
The Great Hack (2019) looks interesting.
The Cambridge Analytica scandal is examined through the roles of several affected persons.
Weirdly by a company also involved in surveillance capitalism?
I mean the W3C was pushing for the semantic web, more rdf, more linked data and xml structuring.
Down with XML, down with linked data, rdf and the very idea of the semantic web – uggghhhh! (or something like that? I can hear you all say!).
Well hold on, remember how the web started? Remember the foresight which kept the web free and open. Insights like SVG when the proprietary alternative of flash was ruling the web. I for one really liked XML and the suite of technologies which came along with it. XHTML was a joy to use once browser vendors got on board and sorted there act out.
I was there during the fight from HTML4 to XHTML 1.0. Still remember fighting about Microformats vs RDF at BarCampLondon2 and to be fair WHATWG was likely right at the time but they didn’t have the foresight of looking further into the future. The semantic web was a big vision but whats the big vision of WHATWG now?
My fear is handing the web over to mainly browser vendors will lead us back to where the web was at during HTML 4.0. A mix of unspecified bits and bobs which rely on native browser capabilities. Whos fighting for accessibility, i18n, l10n, old systems, etc, etc? My only hope is because the w3c only handed over control of HTML and DOM, they will double down on CSS and ECMAscript?
I want the web to move forward and I know there was a lot of tension between the W3C and WHATWG but they kept each other honest. Handing the web over, I fear will ultimately make things worst for all?
I’ve been looking for a way to create SMIL files with an editor for a while. Main reason being to speed up the creation of creating podcasts for the Perceptive Podcast client and make it easier for those who don’t understand markup/code.
One of the techniques we deployed during the Visual Perceptive Media project was to export final cut xml out of final cut/premiere pro then transform the lot with XSL/Python/etc to something else more usable. Its something I’ve had in mind for a long time, as you can see with this paper/presentation I wrote 12 years ago.
There was a point when Wmas, could create an editor for our director/writer (Julius) or allow him to use tools he was familiar with (non-linear editor like Finalcut/Premiere). Of course we choose the latter and converted the final cut xml (which isn’t really an official spec) into json using python. We were able to use markers and zones to great effect, indicating the interactive intentions of the director in a non-linear editor. This meant the intentions can exist and run completely through to the very end, rather than tacking it on at the end.
So with all that in mind, I started thinking if I could turn Audacity into a editor in a similar way? Is there a final cut xml format for audio? Thats when I came across this article which made perfect sense – Audacity files are just XML documents, sooo…
Structure of a empty project
<?xml version=”1.0″ standalone=”no” ?>
<!DOCTYPE project PUBLIC “-//audacityproject-1.3.0//DTD//EN” “http://audacity.sourceforge.net/xml/audacityproject-1.3.0.dtd” >
<project xmlns=”http://audacity.sourceforge.net/xml/” projname=”blank-audacity_data” version=”1.3.0″ audacityversion=”2.2.1″ sel0=”0.0000000000″ sel1=”0.0000000000″ vpos=”0″ h=”0.0000000000″ zoom=”86.1328125000″ rate=”44100.0″ snapto=”off” selectionformat=”hh:mm:ss + milliseconds” frequencyformat=”Hz” bandwidthformat=”octaves”>
<tags/>
</project>
Just the title ignited my mind, the actual content of the blog is less interesting but I realised I may have a free & open-source editor which runs on every platform and with a bit of XSL magic could be the start of the editor I was looking for? The idea of it being a pipe, which leads on to more is something which fits in the bigger pipeline chain
I also found a GIT project to Parse audio track times from an audacity .aup projects. Its uses XSL to do the processing, so I may spend a bit of time playing with it to make something useful.
Just need to dust off my old XSL development skills… Which reminds me what happened to XPROC (XML pipeline language)?
I recently introduced a few friends to Mastodon and tried to explain why I think its a step forward. Others have hinted at this all too.
There are many issues they face and some are highlighted in a blog post I wrote a while ago when talking about mastodon. But recently I had a interesting discussion about a part of the decentralised web I’ve not had for a while. Lack of censorship of dangerous & in some places illegal content.
This might seem as quite a shock to a lot people use to the moderation/gatekeeping of centralised platforms, especially while browsing through the list of mastodon servers to join.
Generally a lot of the people in the Dweb (decentralised web) world understand the advantages and disadvantages of decentralised based systems including this. But it can come as a shock to others who have rarely come across anything like this. I would say this is like the red light district in Amsterdam. Its there if you want it, its better/safer for the those involved and its easier for law enforcement to do their job. Consider this happens regardless is important to note.
Of course it totally depends on the media, content, etc… Theres a sliding scale from stuff which is totally illegal to things which are more questionable depending on your culture, faith, etc. Mastdon has ways to not just filter but also block and ban things. The join an instance is ideal because it sets the tone and makes explicit the rules of whats tolerated and whats not. This gives transparency to the users and should stop things like the Facebook blocking breastfeeding policy.
I do understand its off putting to new Dweb users but like the Cloudflare daily stormer censorship or the British porn block, theres a serious lesson to be learned. Lets not kid ourselves, simply hiding it or pushing it underground will ultimately make things worst for everyone. Law enforcement works much better when there’s cultural and societal norm against the something. This is why the war on drugs has been and always will be a unwinnable war.
Updated 18th Feb
Mozilla’s IRL podcast has a episode which is along the same lines and worth listening to.
Some people believe that decentralization is the inevitable future of the web. They believe that internet users will start to demand more privacy and authenticity of information online and that they’ll look to decentralized platforms to get those things. But would decentralization be as utopian as advocates say it could be?
If I had some money from all the people who sent me details of Tim Burners-Lee’s Solid I would have enough to buy a cheap flight to somewhere in Europe with a cheap airline.
Solid is meant to change “the current model where users have to hand over personal data to digital giants in exchange for perceived value. As we’ve all discovered, this hasn’t been in our best interests. Solid is how we evolve the web in order to restore balance – by giving every one of us complete control over data, personal or not, in a revolutionary way.”
Solid isn’t a radical new program. Instead, “Solid is a set of modular specifications, which build on, and extend the founding technology of the world wide web (HTTP, REST, HTML). They are 100% backwards compatible with the existing web.
Main reason why people seem to be sending it my way is because of another open source project I’m involved in called Databox.
For me the Solid is a personal data store, its like a secure vault for your data. This is good but like 2 factor authentication over SMS, not as secure as other ways. Put all your personal data in one place and its a central point for those who want everything at once. Think about how many times you have seen leaks of databases which contain credit cards, numbers, emails, names, etc… Its the eggs/data in one basket problem…
This came up at Mydata 2018, there was quite a lot of discussion about this through out the conference and touched on in Mikko Hypponen’s talk.
The data in one place is just aspect, others are more about the value proposal to people and technically how verified claims work; as expressed in how solid is tim’s plan to redecentralize the web.
The comparisons between Solid and Databox have been asked by many and I would certainly say Databox (regardless of its name) isn’t a place to hold all your personal data. You could use it like that but its more of a privacy aware data processing platform/unit. I remember the first time I heard about Vendor relationship management (VRM), it was clear to me how powerful this could be for many things. But then again I also identified Data portability as something essential while most people just didn’t see the point.
Everything will live or die by not just developer support, privacy controls, security, cleverness, but by user demand… and it feels like personal data stores still a while off in most peoples imagination.
Maybe once enough people personally experience the rough side of personal data breaches it may change?
For example today I received a email from have you been pwned saying…
You’re one of 125,929,660 people pwned in the Apollo data breach.
In July 2018, the sales engagement startup Apollo left a database containing billions of data points publicly exposed without a password. The data was discovered by security researcher Vinny Troia who subsequently sent a subset of the data containing 126 million unique email addresses to Have I Been Pwned. The data left exposed by Apollo was used in their “revenue acceleration platform” and included personal information such as names and email addresses as well as professional information including places of employment, the roles people hold and where they’re located. Apollo stressed that the exposed data did not include sensitive information such as passwords, social security numbers or financial data.
Till this is a everyday occurrence, most people will just carry on and not care? Maybe theres even a point it should be part of the furniture of the web, like the new grey?
Its twice I heard something similar to this now.
First time was from Gregor Žavcer at MyData 2018 in Helsinki. I remember when he started saying if you have no control over your identity you are but a slave (power-phased of course). There was a bit of awe from the audience, including myself. Now to be fair he justified everything he said but I didn’t make note of the references he made, as he was moving quite quickly. I did note down something about no autonomy is data without self.
Then today at the BBC Blueroom AI Society & the Media event, I heard Konstantinos Karachalios say something very similar. To be fair I was unsure of the whole analogy when I first heard it but there seems to be some solid grounding for this all.
This is why the very solution of a self sovereign identity (SSI) as proposed by Kaliya Young and others during Mydata speaks volume to us all deep down. The videos, notes from that session are not up yet but I gather it was all recorded and will be up soon. However I found her slides from when she talked at the decentralized web summit.
This looks incredible as we shift closer to the Dweb (I’m thinking there was web 1.0, then web 2.0 and now Dweb, as web 3.0/semantic web didn’t quite take root). There are many questions including service/application support and the difficulty of getting one. This certainly where I agree with Aral about the design of this all, the advantages could be so great but if it takes extremely good technical knowledge to get one, then its going to be stuck on the ground for a long time, regardless of the critical advantages.
I was reminded of the sad tale of what happened to Open ID, really hoping this doesn’t go the same way.
Its over 14 years since the dataportability project was founded by a bunch of well meaning people including myself. It was a challenging time with vendor lock, walled gardens and social guilt trips; to be honest little changed till very recently with GDPR.
Data export was good but user controlled data transfer is something special and one of the dreams of the data portability project. Service to service; not because there was a special agreement setup between the services but because you choose to move of your own freewill; makes so much sense.
This why I was kind of sceptical of the Google data transfer project. But on deeper look its pretty good.
In 2007, a small group of engineers in our Chicago office formed the Data Liberation Front, a team that believed consumers should have better tools to put their data where they want, when they want, and even move it to a different service. This idea, called “data portability,” gives people greater control of their information, and pushes us to develop great products because we know they can pack up and leave at any time.
In 2011, we launched Takeout, a new way for Google users to download or transfer a copy of the data they store or create in a variety of industry-standard formats. Since then, we’ve continued to invest in Takeout—we now call it Download Your Data—and today, our users can download a machine-readable copy of the data they have stored in 50+ Google products, with more on the way.
Now, we’re taking our commitment to portability a step further. In tandem with Microsoft, Twitter, and Facebook we’re announcing the Data Transfer Project, an open source initiative dedicated to developing tools that will enable consumers to transfer their data directly from one service to another, without needing to download and re-upload it. Download Your Data users can already do this; they can transfer their information directly to their Dropbox, Box, MS OneDrive, and Google Drive accounts today. With this project, the development of which we mentioned in our blog post about preparations for the GDPR, we’re looking forward to working with companies across the industry to bring this type of functionality to individuals across the web.
All sounds great and the code is open source on Github for anyone to try out. The paper is worth reading too.
However! The devil is in the data or rather the lack of it. As the EFF point out theres no tracking data exchange, the real crown jewels. The transfer tool is good but if the services don’t even share the data, then whats the point?
With GDPR I send out emails to OKCupid, Plenty of Fish, Tinder and others. So far I’ve only gotten responses from POF and OkCupid. Which means Tinder and others have about a day or so to get back to me with everything before I can start to throw down some fire.
Before I headed on holiday, I got a message from POF then OKcupid a day later, saying they need the request from the email which is on the account. Fair enough, so I forwarded each email to that email address and replied all to myself and to them but from that email account address.
A few days later I got emails, first from POF and then OKCupid.
You have recently requested a copy of your PlentyofFish (“POF”) personal data, and we’re happy to report that we have now verified your identity.
We are attaching a copy of your personal data contained in or associated with your POF account. The password to access the personal data will be sent in a separate email.
By downloading this data, you consent to the extraction of your data from POF, and assume all risk and liability for such downloaded data. We encourage you to keep it secure and take precautions when storing or sharing it.
The information contained in this archive may vary depending on the way you have used POF. In general, this information includes content and photos you have provided us, whether directly or through your social media accounts, messages you have sent and other data you would expect to see from a social media service like POF.
Please note that there is some information we cannot release to you including information that would likely reveal personal information about other users. Those notably include messages you received on POF, which are not provided out of concern for the privacy of the senders.
Sincerely,
POF Privacy Team
Then similar from OKcupid, which makes sense being the same company really.
Dear Mr. Forrester:
You have recently requested a copy of your OkCupid personal data, and we’re happy to report that we have now verified your identity.
We are attaching a copy of your personal data contained in or associated with your OkCupid account. The password to access the personal data will be sent in a separate email.
By downloading this data, you consent to the extraction of your data from OkCupid, and assume all risk and liability for such downloaded data. We encourage you to keep it secure and take precautions when storing or sharing it.
The information contained in this archive may vary depending on the way you have used OkCupid. In general, this information includes content and photos you have provided us, whether directly or through your social media accounts, messages you have sent and other data you would expect to see from a social media service like OkCupid.
Please note that there is some information we cannot release to you including information that would likely reveal personal information about other users. Those notably include messages you received on OkCupid, which are not provided out of concern for the privacy of the senders.
Sincerely,
OkCupid Privacy Team
So on my train journey from Stockholm to Copenhagen, I had a look inside the Zip files shared with me. Quite different, I’d be interesting to see what others will do.
As you can see quite different, interestingly no photos in the OKCupid data dump, even the ones I shared as part of my profile. In POF the PDF is a copy of the Json file, which is silly really.
So the Json files are the most interesting parts…
Plenty of Fish
.POF don’t have much interesting data, basically a copy of my profile data in Json including Firstvisit, FirstvisitA, etc to FirstvisitE complete with my ip address. I also can confirm I started my profile on 2012-01-25.
Then there is my BasicSearchData and AdvancedSearchData which includes the usual stuff and when I LastSearch ‘ed and from which IP address.
Nothing else… no messages
OkCupid
OkCupid has a ton more useful information in its Json. Some interesting parts; I have logged into OKCupid a total of 24157 times! My status is Active? My job is Technology? The geolocation_history is pretty spot on and the login_history goes from July 2007 to current year, complete with IP and time.
The messages is really interesting! They decided to share one of the messages, so only the ones you send rather what you received. As the messages are not like emails, you don’t get the quoted reply, just the sent message. Each item includes who from (me) and time/date. There are some which are obviously a instant massager conversation which look odd reading them now. In those ones, theres also fields for peer, peer_joined, time and type. Its also clear where changes have happened for example when you use to be able to add some formatting to the message and you use to have subject lines.
Some which stick out include, Allergic to smoking?, insomnia, ENTP and where next, The Future somewhat answered, So lazy you’ve only done 40 something questions, Dyslexia is an advantage, But would you lie in return? No bad jokes, gotland and further a field, Ok obvious question, etc.
Next comes the photos (My photos, no one elses)
"caption": "OkCupid's removal of visitors is so transparent, I don't know why they bothered to lie to us all?", "photo": "https://k1.okccdn.com/php/load_okc_image.php/images/6623162030294614734", "status": "Album Picture Active", "uploaded": "2017-08-08 19:16:20"
Of course the images are publicly available via the url, so I could pull them all down with a quick wget/curl. Not sure what to make about this idea of making them public. Security through obscurity anyone?
Now the images strings seems to be random but don’t think this is a good idea at all! Wondering how it sits with GDPR too, also wondering if they will remove them after a period of time. Hence if the image a above is broken, then you know what happened.
Then we are on to the purchases section. It details when I once tried A-list subscription and when I cancelled it. How I paid (paypal), how much, address, date, etc… Its funny reading about when I cancelled it…
"comments": "userid = 7367007913453081320 was downgraded to amateur", "transaction": "lost subscription",
The big question I always had was the question data. Don’t worry they are all there! For example here’s just one of mine.
{ "answer_choices": { "1": "Yes", "2": "No" }, "prompt": "Are you racist?", "question_id": 7847, "user_acceptable_answers": [ "No" ], "user_answer": "No", "user_answered_publicly": "no", "user_importance": "mandatory" },
After all those questions, theres a bunch of stuff about user_devices I’ve used to log into OkCupid over the years going right back. Stuff about preferences for searches, etc.
Going to need some time to digest everything but the OKCupid data dump is full of interesting things. I might convert the lot to XML just to make it easier for me to over view.
I mentioned how I emailed a load of dating sites for my data and then some… Under GDPR. So far I’ve got something form POF but OKcupid finally got back to me, after finally making it to supportconsole@okcupid.com.
Hello,
OkCupid has received your recent request for a copy of the personal data we hold about you.
For your protection and the protection of all of our users, we cannot release any personal data without first obtaining proof of identity.
In order for us to verify your identity, we kindly ask you to:
1. Respond to this email from the email address associated with your OkCupid account and provide us the username of your OkCupid account.
2. In your response to this email, please include a copy of a government-issued ID document such as your passport or driving license. Also, we ask you to please cover up any personal information other than your name, photo and date of birth from the document as that is the only information we need.
We may require further verification of your identity, for example, if the materials you provide us do not establish your identity as being linked to the account in question.
Please note that if you previously closed your account, your data may be unavailable for extraction as we proceed to its deletion or anonymization in accordance with our privacy policy. Even if data is still available for extraction, there is some information we cannot release to you including information that would likely reveal personal information about other users. Those notably include messages you received on OkCupid, which are not provided out of concern for the privacy of the senders.
Best,
OkCupid Privacy Team
Pretty much the same as the POF reply.
I mentioned how I emailed a load of dating sites for my data and then some… Under GDPR. So far I’ve been bounced around a little but POF is the first positive email I gotten so far…
PlentyofFish (“POF”) has received your recent request for a copy of the personal data we hold about you.
For your protection and the protection of all of our users, we cannot release any personal data without first obtaining proof of identity.
In order for us to verify your identity, we kindly ask you to:
1. Respond to this email from the email address associated with your POF account and provide us the username of your POF account.
2. In your response to this email, please include a copy of a government-issued ID document such as your passport or driving license. Also, we ask you to please cover up any personal information other than your name, photo and date of birth from the document as that is the only information we need.
We may require further verification of your identity, for example, if the materials you provide us do not establish your identity as being linked to the account in question.
Please note that if you previously closed your account, your data may be unavailable for extraction as we proceed to its deletion or anonymization in accordance with our privacy policy. Even if data is still available for extraction, there is some information we cannot release to you including information that would likely reveal personal information about other users. Those notably include messages you received on POF, which are not provided out of concern for the privacy of the senders.
Best,
POF Privacy Team
Well I guess they are being careful at least but will be interested to see what other questions they ask me.
Still wondering when the rest will get in touch?
I have written a few times about disruption in online dating, heck its something which will be discussed at Mozilla Festival this year (tickets are available now).
But interestingly the EU’s General Data Protection Regulation may get in there ahead of any setup/network disruption. In the Guardian I saw a piece called Getting your data out of Tinder is really hard – but it shouldn’t be.
Its all about getting data back from Tinder (which remember is part of IAC/Match group)
…Duportail eventually got some of the rest of her data, but only on a voluntary basis, and only after she identified herself as a journalist. Her non-journalist friends who followed suit never got responses to similar requests.
Finally armed with the 800 pages she had clawed back from Tinder, Duportail wrote a story reflecting on her own relationship with her data, and the myopic view Tinder had of her love life. I feel her story helps bridge the chasm between those with information stored in the database and the architects behind it, providing much needed neutral common ground to democratically discuss power distributions in the digital economy.
Given the popularity of her story, and my overflowing inbox, I would say many agree. And indeed, you should expect more similar stories to be unearthed in the future because of the upcoming General Data Protection Regulation (GDPR). From May 2018, the new European-level regulation will come into force, claiming wider applicability – including on US-based companies, such as Tinder, processing the personal data of Europeans – and harmonising data protection and enforcement by “levelling up” protections for all European residents.
I know there is a lot of push back from the big American internet corps, but this is coming and the there is no way they can wriggle out of it?
…beyond the much older right of access, the true revolution of GDPR will come in the form of a new right for all European citizens: the right to portability.
It seems like such a small thing but actually it has the potential to be extremely disruptive. Heck its one of the things I wanted back in early 2011. Imagine all those new services which could act like brokers and enable choice! It could be standard to have the ability to export and import rich data sets like Attention profile markup language (APML).
I just wish we were staying in Europe, although the UK has agreed to take GDPR, thankfully! There was no way, if they were left on their own, this would ever come about; like it looks like it might.
I’m back at the Quantified self conference and it’s been a few years since due to scheduling and other conflicts. It’s actually been a while since I talked about the Quantified self mainly because I feel it’s so mainstream now, few people even know what it is, although they use things like Strava, fitbits, etc.
The line up for the Quantified self confidence is looking very good and there’s plenty of good sessions for almost every palette and I’ll be heading up this session while at the conference.
Using Your Data To Influence Your Environment
With home automation tools, it is now possible for your personal data to influence your environment. Soon, your personal data could be used to influence how a movie is shown to you! Let’s talk about the implications and ethics of data being used this way.
Its basically centered around the notion our presence effects the world around us. Directly linking Perceptive media and the Quantified self together. Of course I’m hoping to tease out some of the complexity of data ethics with people who full understand this and have skin in the game as such.
I’m also looking to report back on this conference and restart the manchester quantified self group which went quiet a while ago.
Storj: Decentralizing Cloud Storage from Storj on Vimeo.
Storj is an open-source, decentralized, cloud storage platform. It is based on the cryptocurrency Bitcoin’s (BTC) blockchain technology and peer-to-peer protocols. The Storj network uses its own cryptocurrency, Storjcoin X (SJCX), while its front-end software supports the use of other digital currencies such as Bitcoin and more traditional forms of payment like the dollar. Unlike traditional cloud storage providers, Storj keeps data spread across a decentralized network eliminating the problem of having a single point of failure. It also encrypts all data making it impossible for anyone, including Storj, to snoop on users’ files without having a user’s private encryption key. In return for offering storage space to the network, users are paid cryptocurrency.
Imagine storing all your private data across other peoples drives in encrypted form? Imagine getting paid to store this encrypted data?
Well this is Storj and its frankly quite an amazing concept whoses time as come.
This is a very attractve setup for someone like me with many terabytes of storage and hyperfast broadband. Unlike the risks of running an Tor exit node, everything is strongly encrypted and the host has zero knowledge of whats being stored or transfered.
I already have an account as I’d be interested to see how it works. First heard on Steal this show, how the swarm will beat the cloud.
You could write tools and editors to make the recipes have everything needed to fit with the cooks skill level, ingredients, time, allergies, preferences, party size, etc… I mean who wouldn’t want to describe every aspect of their special dish? (I’m avoiding the copyright/licensing questions for now)
Now that would be something, Clasen? And what better community to kick start such a thing? Dare I bring up the BBC recipe headlines only 6 months ago.
Seems like a no brainier to me?
I read about W3C’s project Memento a while ago but its become a reality recently.
The Memento protocol is a straightforward extension of HTTP that adds a time dimension to the Web. It supports integrating live web resources, resources in versioning systems, and archived resources in web archives into an interoperable, distributed, machine-accessible versioning system for the entire web. W3C finds Memento work with online reversion history extremely useful for the Web in general and practical application on its own standards to be able to illustrate how they evolve over time
Its smart, simple and great because it works on top of http, instead of creating a whole different way of doing the same thing.
I can already imagine memento powered twitter service or memento powered BBC redux service.