Big Data should be the word of the year

bigdata_network

I heard Geoff Nunberg’s piece on NPR’s podcast and I got to say, although I’m pretty much big dated out from BBC Backstage (in a nice way) I’m in total agreement. Here’s a few key points… Well worth listening to in audio form…

Whether it’s explicitly mentioned or not, the Big Data phenomenon has been all over the news. It’s responsible for a lot of our anxieties about intrusions on our privacy, whether from the government’s anti-terrorist data sweeps or the ads that track us as we wander around the Web. It has even turned statistics into a sexy major. So if you haven’t heard the phrase yet, there’s still time — it will be around a lot longer than “gangnam style.”

What’s new is the way data is generated and processed. It’s like dust in that regard, too. We kick up clouds of it wherever we go. Cellphones and cable boxes; Google and Amazon, Facebook and Twitter; cable boxes and the cameras at stoplights; the bar codes on milk cartons; and the RFID chip that whips you through the toll plaza — each of them captures a sliver of what we’re doing, and nowadays they’re all calling home.

It’s only when all those little chunks are aggregated that they turn into Big Data; then the software called analytics can scour it for patterns. Epidemiologists watch for blips in Google queries to localize flu outbreaks; economists use them to spot shifts in consumer confidence. Police analytics comb over crime data looking for hot zones; security agencies comb over travel and credit card records looking for possible terrorists.

It’s the amalgamation of all that personal data that makes it possible for businesses to target their customers online and tailor their sales pitches to individual consumers. You idly click on an ad for a pair of red sneakers one morning, and they’ll stalk you to the end of your days. It makes me nostalgic for the age when cyberspace promised a liberating anonymity. I think of that famous 1993 New Yorker cartoon by Peter Steiner: “On the Internet, nobody knows you’re a dog.” Now it’s more like, “On the Internet, everybody knows what brand of dog food you buy.”

Decentralised networking is hard, no really?

Sydney, January 2009

Straight out of the “No Sh*t Sherlock…” book….

Although I think its amazing what developers do, I can imagine how hard it must be to write decent decentralised software. The Diaspora guys spell out how difficult it is… which Adwale likes to make sure I and others fully understand.

  • If you build a decentralized application, you actually need to ship software. You need to package, test, create installers, test on a variety of platforms, write defensive code to work around misconfigurations your customers are likely to create, etc. For a centralized website, you can often edit files in place on the production server.
    Result: decentralized is 10x harder at least.
  • Somebody somewhere will run every single version of your app that you ever shipped. It will be badly out of date, full of security holes (you fixed years ago), outmoded graphics etc. It will cost you additional support, and your brand will suffer. Almost nobody upgrades to the latest and greatest within a life time it seems.
    Result: decentralized is less functional, less pretty, and less secure.
  • Decentralized software is much harder to monetize. You can’t run ads on somebody else’s installation. You can’t data mine your users (because most of them aren’t in a place that you have access to, it’s somebody else’s installation). You can’t do cross-promotions and referrals etc. You can charge those people who install your software, but there’s a reason most websites are free: much better business.
    Result: decentralized produces less money for you, so you have less investment dollars at your disposal.
  • Database migrations and the like for decentralized apps have to be fully productized, because they will be run by somebody else who does not know what to do when something fails 15 minutes into an ALTER TABLE command.
    Result: decentralized is 10x harder at least.
  • Same thing for performance optimizations and the like: it’s much easier to optimize your code for your own server farm than trying to help Joe remotely whose installation and servers you don’t have access to.
    Result: decentralized is slower, more expensive, and harder.

Frankly although I take the points… If you want to stand out in a clearly over crowded field, and one which has a major elephant using up all the space. You need to think differently (to quote someone we all know too well).

This means doing the difficult things which no one understands and owning the platform!

Your business model should/could be charging other developers to build and be creative on top of your platform. App.net have got the right idea, charge the developers who then create the experiences. Your focus should be on managing the platform and supporting their creativity. Anything else is greed and/or lack of focus.

What do I mean by creativity? Think about Tweetdeck

Tweetdeck innovated on top of the Twitter platform and in the end the platform twitter bought them (stupid move). Tweetdeck for a lot of people made twitter usable at long last. The amount of news rooms I’ve been to and seen tweetdeck with a million panels open is untrue. The same isn’t true now… Tweetdeck guys innovated on top of Twitter and instead of sharing revenue with them or something. They bough them…!

A quote which comes to mind is something like…

The train company thought they were in the railroad business, what they didn’t get was that they were actually in the transportation business.

I really like twitter but frankly their control/greed/whatever is getting out of control. While on a panel yesterday at the London transmedia festival in Ravensbourne College. I was sat with Danielle from Tumblr, Bruce from Twitter, Cat from BBC and Doug Scott from Ogilvy. Although its tempting to make a few comments about there change in stance, I passed. Although I did notice say something which could be seen as slightly negative. Doug said how useful Twitter is for understanding users and I agreed but I said,

“Well its important to remember Twitter is only explicit data, implicit data is the stuff people really want to get there hands on…”

Anyway, the point stands and its hard to see how Twitter will get into the implicit data game at this point. If they acted like a platform, maybe someone else would do the innovation for them. But back to the main point why would you do it on someone closed system?

Decentralised network systems are harder but will drive much more interesting creativity… I can see how this might be at odds with setting up a business, startup and having investors etc… But I’m sure I could make a argument that its better in the long run…

The killer application for distributed social networking?

How do we make things move along quicker in the area of distributed/federated technology? Things are moving very slowly although it seems most of the components are in place.

When I wrote the blog about Rebel mouse, I found some interesting links to some distributed solutions which could see the end of the likes of twitter and facebook.

OStatus is an open standard for distributed status updates. The goal is to have a specification that allows different messaging hubs to route status updates between users in near-real-time. This spec took over from the OpenMicroBlogging spec of old.

I remember writing about wordpress’s distributed solution a while ago.

The weird thing is I logged into Diaspora again today and not only is it a ghost town (not like G+, but really like a ghost town) but it got me thinking whats different about Diaspora and G+? Now the hype died down, its time to see some very cool uses of Diaspora. What have they got to loose? Dare I say it, wheres the killer application? Wheres the thing which will make people sit up and take note once again? Heck whys no one doing cool stuff with the API?

So what is the killer application which will tip people over? I have some thoughts but what ever it is, please let it happen soon before we’re all forced to beg twitter, facebook, etc for our data back.

Love of the Self or Data sexuality?

Gary Wolf at Quantified Self 2011

When I read this article about the new breed of urban datasexuals I instantly thought maybe heck I might be one of them…?

The datasexual looks a lot like you and me, but what’s different is their preoccupation with personal data. They are relentlessly digital, they obsessively record everything about their personal lives, and they think that data is sexy. In fact, the bigger the data, the sexier it becomes. Their lives – from a data perspective, at least – are perfectly groomed.

Oh crap that sounds just like me… I find it very difficult to maintain things on paper and much prefer them in data because I can manage them much better. I assumed it was a dyslexia thing to be honest (it might still be).

The origin of the datasexual in all likelihood started with the humble infographic, which is a highly stylized and well-designed way to talk about all the data out there on Web. The infographic trend was followed by the data visualization trend, which made it even cooler to display data in innovative new ways. These data visualization tools eventually gave us cultural artifacts like Nicholas Felton’s annual Feltron Reports, which made the obsessive recording of everyday activities seem cool. From there, it was only a small evolutionary step to the whole quantified self (QS) movement, which promises “self knowledge through numbers.” QS proponents obsessively track every single bit of data about themselves throughout the day. The QS movement eventually led us to the embrace of data by consumer-facing companies Nike, who found a way for urban datasexuals to flaunt their obsessive data-grooming to the rest of us in a way that’s stylish and mod.

For me it stems back to my ideas of the information behind the graphics.

When I was in college, I got into XML because I loved the idea of creating graphics which are self describing and can alter depending on the data there based on.  Hence my love of Scalable Vector Graphics (w3c’s SVG specification). I was also a major pusher of SVG at the BBC for graphics and informational graphics but at the time browser technology was way behind.

Maybe this also reflects why my love of the idea of online dating via numbers, maths, science also really intrigues me so deeply?

The link up between the Quantified Self, personal data and sharing is so tight and also asks many questions. Questions which the early adopters and hackers are answering for themselves right now.

I remember my previous manager Miles Metcalfe, talking about the intangible of the semantic web… Somethings can not be quantified, at least in the ways were going about it right now. I would agree but we’ll have a good old go trying to do so. And from doing so we’ll have lots of fun, its when it stops becoming fun is the problem…

I’ll say it now… Data is Sexy no two ways about it… but the term data sexual does worry me a little along with quite a few of the commenters. Its Data Love but under the understanding that not everything can be defined or captured yet.

Updated

After the debate on Techgrumps 60 yesterday (listen to the last 10mins to understand) with Tom Morris who compares Data sexuality to something else which is certainly not pretty or nice. But the point is taken, what has this got to do with sexuality? Spicing it up with sexuality just confuses the whole thing and maybe makes those who love data into something their not. Data love is much better as a overall idea.

Is it possible to match people with science?

This has got to be one the eternal questions? Maths or science has solved so many of our questions but can it be used for working out compatibility of humans?

That was one of the things which really intrigued me about a year of making love. I assume you’ve seen how it turned in on its self since the production team totally screwed up the process and kept us all in the dark about it. And if you want further evidence do check out the tweets for #yearofmakinglove and #yoml

However because of the total screwup most people are saying its a total failure (maybe very true) but also science or rather maths was never going to work… I can’t disagree specially after the experience we all had yesterday. However basing any judgments off the back of yesterdays experience would be a mistake.

So do I personally think maths/science can match humans? Maybe… (yes what a cope out) but to be honest no one knows for sure. And thats the point of the experiment.

At the very start of the day (ordeal) we were introduced to the professor who devised the test/questions and the matching algorithm. I remember tweeting this

As Michael replied a far…

And he’s right…

In my own experience to date, the matching algorithm over at OkCupid.com has been pretty darn good (not perfect!) (OKCupid’s OK Trends are legendary – check out the biggest lies people tell each other on dating sites and How race effects the messages you get). But I had to train it to be good. I’ve to date answered about 700+ questions and there not just questions. There detailed, so you have to answer it, then specify how important this is to you and what answer your ideal match would pick. This makes for much more dimensions in the answer criteria and ultimately the algorithm. Aka the algorithm is only as good as the dataset its working on.

You got to put in the data/time, if you want it to be good… Otherwise your going to get crappy results.

This makes the 50 questions answered for the year of making love look like a pop quiz (hotshot), to be honest.

So back to the original question slightly modified, can a algorithm match people in the interest of love? I think so to a certain extent. But its not the complete picture. Chemistry is a big deal which is very difficult to understand. Its not found by answering questions but watching the interaction between people. Its a different type of algorithm… Situation can cause chemistry, aka the reason why everyone came together on the coaches home (or to the wrong city as some of them seemed to do) is because there was a social situation which we could all share/talk about. (cue talk about social objects/places) Chemistry was in full effect?

I hope people don’t give up on science as a way to find their ideal partner just because of the terrible experience they had at The year of making love… is I guess what I’m saying…

Public 2.0: The era of personal data openness

I was in London Thursday for the Public 2.0 conference, which the guys behind the Data Art  project put together. It was a nicely put together conference with a mix of speakers and topics.
I kicked off the day with my presentation titled The era of personal data openness.
When I was approached about doing a presentation for the data art conference. I wasn’t sure which angle to take. After a few thoughts, I decided to contact the data art guys and see what they were exactly after. After a brief chat, I decided to take the more interesting path in this presentation
The premise of the presentation is Open data from organisations like the government, companies is interesting and the movement around this has finally sunk in. There wasn’t a single government proposed agenda last year which didn’t include something about releasing more open data. And every startup and online business is building APIs, so they can take advantage of the overwhelming power of the rich ecosystem of developers, hackers and early adopters. But I’ve noticed a increase in tools and systems to take advantage of our own data and the data we generate everyday.
I was tracking this very much from the sidelines and had not found a decent way of explaining the topic of self documentation. That was till I had lunch with Rain Ashford.
We talked through a bunch of stuff but got talking about my presentation which I was due to give next day. And after describing the premise like I am now. She said it sounds a lot like Quantified Self
Bingo! Having never heard of the movement, it instantly made sense and further research clarified everything.
Quantified Self is the Era of personal Data Openness….
Its also worth noting Walter De Brouwer’s presentation at Thinking Digital also had some influence but I forgot to mention it. Two links from that session http://curetogether.comwww.patientslikeme.com all fit perfectly…

Its all about the Top10?



Been meaning to blog about Harry Jones’s next venture for a while…

I was shown a beta of it a while ago and I got to say upfront I’m a close friend of Harry’s and yes I did teach him in university (Ravensbourne college of design and communication). At the time he was doing lots of Flash stuff and singing from the Adobe/Macromedia song book but I slowly broke that down and I kind of remember one day he turned to me and said he gets it (referring to XHTML) not the actual technology (Harry was very smart and picked it up instantly) but the concept of the web being readable to not just humans but to machines and devices.

Anyway a while he launched Top10.co which is a place for all those crazy top 10’s. Top 10 makes it super easy to create your own list but the magic comes when you remix someone elses. What it does is create an instance of the top 10 someone else did and allows you complete control over the list. So in the video above you can see some artificial but funny disagreement over cheesy 80’s action film between Harry and the other co-founder (don’t give up your day jobs to be actors  guys….). But as its a instance, its still linked to the original, so you can see an aggregated view of everyones top10.

For example I setup a top 10 after my blog post about the films you should have seen in 2010.

Here’s the aggregated view (master list) of everyone who has contributed to the top films in 2010, and of course my top 10.

So interestingly they have catered for both the popular stuff in the actual list but also the long tail in the variety of weird and wonderful lists you can make.

Its all public, so thats great. However there are problems… The first one is the non-ability to embed the top 10 list. I know it supports Facebook and Twitter (in actual fact you sign up using one of them, wheres the openid guys?) but I just want to embed or even copy the information to somewhere which is mine. May seem slightly old fashioned (can’t believe I just said that) but its important. I’d also like to see better use of there blog… When I first saw it I first thought it could be like Okcupid’s OKtrends… You know, “we have 100000 users who picked films of 2010, and we aggregated all the lists together to show the top film choice across all lists is Inception…” type stuff. Right now its just fluff about the company which is a yawn to be frank. Heck even this is the types of films which make it to the top of most film related lists, and here’s the ones which tend to be in the middle ground… Come Harry… Data is the new source code (think I just coined a new saying)

So as a whole I think the concept is good, but it really needs someone to think about uses of the top 10’s. I’d also like to see a format which makes it easily transferable. Like a opml file with top10 extensions… but thats further down the line and I expect most won’t be interested in such a thing. Although when I explain what you can do with such a thing, it gets very interesting… Maybe I harry should pop up one day soon for a chat but he’s one busy son of a gun…

Anyway good luck and good to hear he’s moved on from all that Apple crap which he had to deal with a few years ago. Oh and congrats on getting engaged to the lovely Tiffany…

Beyond the data: making meaning of data

Ian Forrester at Next 09

The Next conference is looking for speakers for 2011 and they have opened a public voting system to seek out the next generation of speakers. Of course I have put a talk based around the Channelography project.

Beyond the data

Data is something we have been working with before it became mainstream a few years ago. We have many projects using data but what are the challenges once you have the data? Using the prototype Channelography (http://channelography.rattlecentral.com/) we will explore the challenges and dangers of collecting data and trying to make meaning of it.

So of course, it would be great if friends could vote for my talk, although there’s some great proposals like this one for OKcupid to talk and Brian Suda’s rally cry to analyse 15 Petabytes of data from the Large Hadron Collider.

The Joy of Data

Via infosthetics,

It was only a matter of time before the mind-changing talk of Hans Rosling would find its way to the television medium. A reincarnation of this talk will be part of "The Joy of Stats", a new television documentary that soon will appear at BBC. This documentary will explore various forms of data gathering and statistical analysis, such as a new application that mashes police department data with the city’s street map to show what crime is being reported street by street, house by house, in near real-time; and Google’s current efforts at the machine translation project

From what I seen of the programme, it should be called the joy of data not stats.

mydreamscape presentated at social media cafe manchester

Chi-chi Ekweozor (@realfreshtv) has written up my scramble thoughts and presentation about mydreamscape.org up on her blog.

I am amazed at how much detail she got down in the session, its a perfect account of what was said and by who. Her hand must have been going into over drive!

A couple of people ask great questions about privacy and how easy the network will be to spam.

The points raised range from deciding to keep users anonymous to encourage people to share their dreams in detail to wondering how to stop the spammer that ‘keeps dreaming about ‘Coca-cola’ or Justin Bieber(!)

Adrian Slatcher (@adrianslatcher on Twitter) from Manchester Digital Development Agency (MDDA), added some fine observations about dreams being non-linear.

Some people make associations in dreams based on colour, so called ‘colour dreams’. There are also ‘anxiety’ dreams. There is a very strong metaphysical element to dreams.

Adrian went on to add that this ‘crowd-sourced’ emotional categorisation of dreams: ‘anxiety = red’, ‘peace = green’ etc lends itself to making such a social network a very useful psychoanalysis tool.

She also detailed a great conversation we had afterwards with Josh (@technicalfault) about a killer mobile app for mydreamscape

In a conversation with Ian and Social Media Cafe co-organiser Josh (@technicalfault) after his talk, we discussed what I think is the killer application for such a project: a mobile phone app that combines access with the social network with a dream diary linked to the phone’s alarm clock.

As soon as you wake up, you are prompted to record your dream into a ‘What did you dream today?’ interface rather like Twitter’s early ‘What are you doing?’ question.

Different media types could be introduced later on so people would eventually make voice or video recordings of their dreams. That would rock.

I love the idea of asking the question, “what did you dream today?” or even “what did you dream last night?” Its a very catchy punchline and sums up the project nicely. The Flickr of dreams say one thing but “what did you dream today?” says something very different.

The point keeps coming up, why not make a facebook application? And finally I had a reply

People have suggested Ian should implement the idea as a Facebook app. He’s not particularly keen on this, preferring the Flickr model as “Flickr never exposes private stuff.”

Facebook’s EULA makes mydreamscape unworkable or at least cuts right into the users privicy which would make trusting the system almost impossible. Flickr have a good model as they never expose users private data and never will. Hence “the flickr of dreams” tag line.

I’ve presented at barcampmanchester3 and now at Social Media cafe manchester and each time I’ve had a positive response, but raised many more questions. Some of those question have been useful but none have been no this is a terriable idea. In actually fact I’ve picked up a few people who really want to get involved along the way. Each person has offered some advice and some more passion into the general idea. I think my next step is to do a map for the idea (a masssive A0 sheet of endless paper with information about the idea and details which I currently have in tomboynotes). I can then publish the map and make it even easier for people to develop the idea themsleves. Its also handy to have everything on one sheet, so I can put everything in context. This presentation isn’t really explaining the idea very well and does a bit of deservice to the underlying idea. I really hope to change for something better soon. But for now it explains the concept enough…

.