Microblogging dataportability at last?

Twitter data dump

Finally got the ability to download my tweets… Over 6 years of tweets in 6.8 meg of files.

It comes in a zip file not a tar file which is interesting because Facebook uses Tars for its data dumps. Structures interesting because its less of a dump and more a formal backup of your data complete with HTML file bring it all together. Theres a README.txt file which reads…

# How to use your Twitter archive data
The simplest way to use your Twitter archive data is through the archive browser interface provided in this file. Just double-click `index.html` from the root folder and you can browse your entire history of Tweets from inside your browser.

In the `data` folder, your Twitter archive is present in two formats: JSON and CSV exports by month and year.

  • CSV is a generic format that can be imported into many data tools, spreadsheet applications, or consumed simply using a programming language.
  • ## JSON for Developers
  • The JSON export contains a full representation of your Tweets as returned by v1.1 of the Twitter API. See https://dev.twitter.com/docs/api/1.1 for more information.
  • The JSON export is also used to power the archive browser interface (index.html).
  • To consume the export in a generic JSON parser in any language, strip the first and last lines of each file.

To provide feedback, ask questions, or share ideas with other Twitter developers, join the discussion forums on https://dev.twitter.com.

Most of the data is JSON which bugs me a little only because I would personally have to transform it all to XML but alas I’m sure everyone loves it. The CSV spreadsheets are odd and could do with being XML instead of CSV but once again sure its useful to someone out there. The nice thing is there is tons of meta around each microblog/tweet including the geo-location, time and device/client. Even the URLs have some interesting things around it, because I was wondering how they were going to deal with shorten urls, retweets and mentions…

Ā “urls” : [ {
“indices” : [ 69, 89 ],
“url” : “http://t.co/GSzy55vc”,
“expanded_url” : “http://epicwerewolf.eventbrite.com/”,
“display_url” : “epicwerewolf.eventbrite.com”
} ]

Doesn’t always work… specially when using urls shortener which don’t keep the url after a certain time period. Interesting internally twitter always uses its own t.co for everything…

Right now I’m just interested in the period around my brush with death… Real shame theres no references to mentions you’ve had, as I would have loved to have seen some of those. Guess Twitter were not going to delve into that can of worms…

I want to know why theres no status.net inporter?

Cnet have a overview of how and what to do with the archive. Thanks Matt

Author: Ianforrester

Senior firestarter at BBC R&D, emergent technology expert and serial social geek event organiser. Can be found at cubicgarden@mas.to, cubicgarden@twit.social and cubicgarden@blacktwitter.io