metadata – Cubicgarden.com…

Its time to move from Whatsapp to Signal

For the longest time people have asked why I don’t use Whatsapp? As someone who sees the flaws and reads the terms/conditions, I was never happy with the use of data. In the last few days its become much clearer with Whatsapp privacy change forcing you to accept the changes or stop using it (no opt out).

The core issue with the change is…

“As part of the Facebook family of companies, WhatsApp receives information from, and shares information with, this family of companies,” the new privacy policy states. “We may use the information we receive from them, and they may use the information we share with them, to help operate, provide, improve, understand, customize, support, and market our Services and their offerings.”

In some cases, such as when someone uses WhatsApp to interact with third-party businesses, Facebook may also share information with those outside entities.

Ok so they can’t read your (signal) encrypted messages but like we learned a long time ago, the metadata is sometimes tells more than the actual message.

Surely its time to switch?!

No? Remember this… dumb f**ks.

Public Service Internet monthly newsletter (Sept 2020)

We live in incredible times with such possibilities that is clear. Although its easily dismissed seeing the 2020’s worst security breaches so far and asking what changed in the EU?.

To quote Buckminster Fuller “You never change things by fighting the existing reality. To change something, build a new model that makes the existing model obsolete.”

You are seeing aspects of this happening with challenges to app stores and seeing people take action over their social data.

My browser never ever consents

Ian thinks: One thing I like is browser extensions. This one is part of a movement to never consent to anything sites want to push us. Try it out and disable it if you don’t like it

More emerging tech from open hardware world

Ian thinks: NODE features a lot here but look out for the Edge of tomorrow exoskelton and the open cardboard robot which turns a smartphone into a low-cost remote telepresence robot.

The Underestimated Threat Vector: Homogeneity

Ian thinks: Right in the middle of the Defcon hacking virtual conference, a well thought-out talk about the threat of homogeneity. Remember not to read the comments. I could have done a whole month of recommendations for Defcon talks. I highly recommend browsing through the Defcon talks by starting at the playlists.

Thank goodness for the Max Schrems of this world

Ian thinks: I have been moving my data into EU servers when Brexit happened and its very difficult. Thanks goodness for the Max Schrems of this world, willing to go to court over this.

Build the resilient future using indigenous wisdom

Ian thinks: Julia Watson’s short TED video is deeply impressive and shows the answers are all out there, in places and communities we tend to ignore.

Paying for privacy, not ideal but interesting

Would you really abandon Google for a paid ad-free search engine? Neeva might be for you? Although I’m happy with Duckduckgo’s non-tracking ad system.

Deepfakes solution needs critical mass

Ian thinks: Its a reasonable idea but relies on people/systems creating and looking for the CAI metadata. I also checked if it was opensource and it is. I look forward to the Gimp plugin very soon.

Fawke your face for protection against facial recognition

Ian thinks: Finally a reasonable way to cloak your face from facial recognition systems like clearview AI. Nice open source which you can run without visually distorting your beautiful face.

Facebook ads under close scrutiny

Ian thinks: We all know how facebook adverts and editorial weighting works but everyonce in a while a experiment really brings it into sharp view. This Vox video is that, although they also point at the little known Facebook Ad Library.

Find the archive here

The mixcloud metadata take down

I mentioned a while ago how I was slowly migrating away from Mixcloud as their business model is starting to impinge on people listening to my mixes and I’m not so keen on that. I already mentioned trying to get Funkwhale working and using cue files.

While looking through my mixes I noticed I really didn’t do a great job with some of the metadata. While I had spent most of my time adding to the metadata to Mixcloud (not ideal)

In Mixcloud (once logged in) there are some human friendly urls which I was able to grab images from. The key one being the upload edit page – https://www.mixcloud.com/upload/{username}/{mixname}/edit/ for example https://www.mixcloud.com/upload/cubicgarden/follow-me-into-the-fading-moonlight/edit/

Follow me into the moonlight edit page

My plan was to manually copy the times into my newly written cue files but while talking to Jon about it, he said give him 5mins and he could knock up a script to pull the values out of the HTML page. I thought about it before but using XSLT, however noticed there is a lot javascript rendering making things difficult.

Jon’s quick script written was just what I needed.

#!/usr/bin/env python3

import csv
import sys
from collections import namedtuple
from typing import List

import bs4
from bs4 import Tag

SongInfo = namedtuple('SongInfo', ['number', 'artist', 'title', 'time'])

def load_html(filename: str):
    with open(filename, 'r', encoding='utf-8') as fo:
        return ''.join(fo.readlines())

def extract_song_info(song: Tag):
    try:
        number = song.find(class_='section-number').text
        artist = song.find(class_='section-artist').text
        title = song.find(class_='section-title').text
        time = song.find(class_='section-time')['value']
        result = SongInfo(number, artist, title, time)
        print(f'Extracted {result}')
        return result
    except AttributeError:
        print(f'Error with item {song}')
        return None

def parse_table(input_html: str):
    soup = bs4.BeautifulSoup(input_html, features="html5lib")
    songs = [row for row in soup.find_all(class_="section-row")]
    return [x for x in [extract_song_info(song) for song in songs] if x is not None]


def save_to_csv(file_name: str, songs: List[SongInfo]):
    with open(file_name, 'w', encoding='utf-8') as fo:
        writer = csv.writer(fo)
        for song in songs:
            writer.writerow(song)


if __name__=="__main__":
    if len(sys.argv) != 3:
        print('Usage: extractor.py [input_html_file] [output_csv_file]')
    html = load_html(sys.argv[1])
    songs = parse_table(html)
    save_to_csv(sys.argv[2], songs)
    print(f'Saved to {sys.argv[2]} successfully - Done!')

With it and the HTML pages, which I almost got with Chromedriver, again thanks to Jon, but I couldn’t be bothered to sort out the cookies, etc. I quickly wrote a quick /dirty bash script and fired up a terminal.

#!/bin/bash
./extractor.py $1.html $1.csv
# Verify
echo Details for $1

I thought about modifying Jon’s script to generate the cue files directly bypassing the csv file but decided I should just get them all done. Because I still need to get funkwhale going.

I did notice the edit page doesn’t include genre or the year of the mix, but I can live with that, for now… Scraping web pages is certainly a throw back but its better solution that what I originally was thinking.

This will teach me to sort out my own house of data!

Imagine 60db with Object based media?

When I first heard about 60dB, I thought great someones finally made a object based podcasting client.

60dB brings you today’s best short audio stories – news, sports, entertainment, business and technology, all personalized for you.

Unfortunately I was wrong.

Its a bit like stitcher which is well loved by some people.s

It does seem to pick and play news stories. But the sources are specially crafted (ready for syndication like this) rather than the client processing the audio and picking out the parts most relevant to your listening preferences.

Its understandable because to do this you would need well thought-out metadata created by the original author/production. Without it you can’t have objects, without objects you are reliant on serious processing of the audio to build the metadata which the player can use (that or some serious computational power).

I had heard and thought it was a logical move for Google Play’s podcasting support would include some kind of basic automated metadata/transcript but it never happened. Another missed opportunity to show off the power of google and make themselves a essential part of the podcasting landscape, like how Apple did with itunes.

Seems like a great opportunity for some enterprising startup, specially since podcasting might save the world. Dare I say it again, perceptive podcasts could be incredible for all the reasons podcasting originally captured peoples attention.

On the beat with the BBC

I had the pleasure of attending the BBC’s on the beat conference.

The conference was in beta state as LJ Rich announced at the very start. It is a example of the way the BBC is changing. More people within are able to push the organisation the way it needs to go. More risks taken, more gained for the general public.

One of my crazy ideas is becoming true next week! Are you into Music & tech? DM if you want to attend! #BBConthebeat pic.twitter.com/XQPHfJzOcO

— Sara Gozalo (@sara_sgm) December 1, 2014

Beta or not, it was a good afternoon with speakers from across the music industry. Each panel was backed up with somebody from the BBC.

The sessions were centered around the audiences, discovery and metadata. The keynote was given by Mark Mulligan, whom I gather is well known for his blog and writing about the future of music.

It was interesting to hear how the music industry has parallels with the film industry.

Artists need to ‘find their popcorn’

In the audiences panel it was interested to meet and hear from DJ Charlie Sloth and Shazam. There was also a interesting reflection between the Charlie’s focus on the BBC Radio 1xtra audience who may not own a hifis and use their phones for music discovery. Against a talk by Jeff Smith from BBC Radio 2, with audiences who still buy CDs.

In the Metadata session which included Music Brainz, there was a debate about the lack of metadata in music and they only scratched the surface. When it came out that theres not really a well used standard for music classification.

To which I tweeted…

How can the record industry expect us to pay if they can't even be bothered to sort out metadata standards!? #BBCBeat

— Ian Forrester (@cubicgarden) December 12, 2014

In who’s your music dealer? with Spotify, PingTune, BBC R1/1Xtra, the question of algorithms for discovery came into call. But even more interested was the power of the DJ to bring forward music unlike anything else. Something the music algorithms fail at.

.@bbcmusic #BBCBeat conference yesterday w @jimpurnell, @GeorgErgatoudis @LJRICH (nice work @sara_sgm @tw0tw3ntytw0!) pic.twitter.com/ZdghdHRwID

— David Jones 大卫琼斯 (@djonessays) December 13, 2014

The event was top and tailed with musical demos from many companies and our own BBC R&D UX team showing off the scalable documentary. However there was also, LJ’s impromptu’s play on the piano.

That's nothing @iamjakebailey here's me properly showing off on a piano 🙂 http://t.co/6AgKj0b930 #bbcbeat people, thanks for having me!

— @LJRich Music & Tech (@LJRICH) December 12, 2014

…and live music which I thought was odd, however I really enjoyed the quite unique voice of Layla, one of the many artists who signed up with BBC Introducing.

Layla at #BBCbeat pic.twitter.com/rMMi4syJTU

— Beth Anderson (@betandr) December 12, 2014

All excellent stuff and ultimately reminded me that DJ Hackday needs to happen… Love to team up with BBC Music to consider the future of participation, remixing and music discovery from a slighly different standpoint.

combine Dreamboard and Lucidpedia please

Me, myself and I: a dream tracking tale from Luca Mascaro

I would really like to see Lucidpedia and Dreamboard do a combined application or at least interoperate with each other (can you imagine an xml schema for dreams?)

Dreamboard is beautiful looking and the app is simply a joy to use. Writing into Lucidpedia is frankly a nightmare, specially on a mobile device, although it does have rich metadata.

I want to ask Luca about the last slide, because that looks just like the plans for the never realised Mydreamscape.

World domination…? Nope just an understanding of what people are dreaming about all over the world (flashforward collective memory style).

Dream journal metadata

I said a while ago how I found a site which is much closer to mydreamscape than anything else I’ve found to date.

One of the key parts was the dream journaling and its interesting to see the amount of metadata lucidpedia requires.

Title, date, dream… is fine and they actually have a level of privacy

If you like, you can annote dreamsigns by enclosing words or sentences with [ds] … [/ds] tags.
Parts of your dream description that you rather prefer to be private when you decide to share this dream with others,
use the [private] … [/private]

Cool but, how about,

Sleep time – First state the time when you went into bed and then input the time when you woke up in the morning
Image and Video – Add your own Image or YouTube movie to this dream!
Rating out of 5
Lucidity – Were you not lucid, half lucid or fully lucid anytime during this dream?
Characters – Which people that you know from real-life did you meet in your dream?
Permissions which are…

Share with Lucidipedia.com users only
Only people with a Lucidipedia account are able to read, rate and comment on this dream record by using Dream Journal.

Make this dream readable for everyone
Share also with anyone outside of Lucidipedia.com, Anyone you provide with the URL of this dream record would be able to read it. Great for sharing on Facebook or Twitter with friends!

Thats one heck of a load of metadata about a dream…! By the time you filled all that in I’m sure you would have forgotten most of the facts… I’m fascinated by the idea of sticking a video or image into the dream but I’m sure it would be used in the right way.

Dj mixes rebooted…

Counting from a few posts ago, I’ve been thinking…

A while ago I suggested to Mixcloud the concept of mobile playlists tailored for Mixes, but they didn’t really see the point. But recently I suggested the same thing to Dirty Si and he was a lot more receptive to the concept. Right now when I do a mix, I tend to create a piece of metadata to go with the mix. The NFO file (yep straight out of the darknet) contains the playlist order and any other metadata I feel is required. I would use PLS, M3U or even XSPF but I’ve just done something to scratch my own itch. I might switch to using XSPF with a namespace for my own metadata and add the SMIL namespace. There’s a whole bunch of hacking which needs to be done in this area…

Playlists do not equal mixes…

I’ve been thinking about this even more recently since Google and Amazon’s music locker systems.

Everyones been thinking about singles or albums. But I’m thinking way beyond that. What about mixes? Imagine if the necessary metadata was in place to create extra special experiences around mixes?

But why even mess around with the metadata when you can mess with the actual mix its self?

The Pacemaker (for example) right now, stores the actions of the dj and then recreates the mix using the tunes on the host machine. I’m wondering if you could grab that data and turn it into something like MIDI then you could really do some revolutionary things to dj mixes.

Would it be possible to setup a Amazon EC2/Google app instance which could read the midi data and use the raw tunes to create a stream in real time? What effect would this have on listen to mixes?

Once again, I’d really like to hack around with this stuff if I had the time.

What should the character limit be on microblogging?

Evan Prodromou who I finally met at FOSDEM recently, has been running a poll to find out whats the best character limit for microblogging.

For our flagship site, Identi.ca, which runs on the status.net cloud service and uses the 0.9.0 beta, we’d like to open up the discussion of what an appropriate character limit should be. Setting a site-wide limit is a community decision we’d like to leave in the community’s hands. In a conversation on Identi.ca we’ve solicited some candidate character limits that we’d like people to decide on.

140 is compatible with Twitter; in many languages a notice with 140 characters fits into a single SMS message*

280 can fit into two Twitter or SMS messages

300 is a fan favorite

420 is Facebook’s status limit and 3 Twitter tweets or SMS messages

500 is a little bigger

1000 is bigger than that

Unlimited

Other

So personally I think 300 is enough. 300 will hold a very long URI with room for query string values. Also having it about the size of two text messages seems about right. If you stick to ANSI only characters you usually get about 306 characters to text with (160+160 with overhead) on most phones. Unicode drops it down to 280 characters which still seems fairly close to 300. I’m also thinking 300 characters keeps things micro readable still.

The idea of a more structured microblogging with URIs as metadata in interesting but I think metadata should be inline and in plain view. Its one of the neat things about Microblogging, which would be a shame to remove. Also got to say anything more complex than the current microblogging setup would maybe cause too many problems with backwards compatible. Literate results are good, if you want metadata use blogging instead.

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31