Skip to content

Cubicgarden.com…

Thoughts and ideas of a dyslexic designer/developer

Cubicgarden.com…

The mixcloud metadata take down

I mentioned a while ago how I was slowly migrating away from Mixcloud as their business model is starting to impinge on people listening to my mixes and I’m not so keen on that. I already mentioned trying to get Funkwhale working and using cue files.

While looking through my mixes I noticed I really didn’t do a great job with some of the metadata. While I had spent most of my time adding to the metadata to Mixcloud (not ideal)

In Mixcloud (once logged in) there are some human friendly urls which I was able to grab images from. The key one being the upload edit page – https://www.mixcloud.com/upload/{username}/{mixname}/edit/ for example https://www.mixcloud.com/upload/cubicgarden/follow-me-into-the-fading-moonlight/edit/

Follow me into the moonlight edit page

My plan was to manually copy the times into my newly written cue files but while talking to Jon about it, he said give him 5mins and he could knock up a script to pull the values out of the HTML page. I thought about it before but using XSLT, however noticed there is a lot javascript rendering making things difficult.

Jon’s quick script written was just what I needed.

#!/usr/bin/env python3

import csv
import sys
from collections import namedtuple
from typing import List

import bs4
from bs4 import Tag

SongInfo = namedtuple('SongInfo', ['number', 'artist', 'title', 'time'])

def load_html(filename: str):
    with open(filename, 'r', encoding='utf-8') as fo:
        return ''.join(fo.readlines())

def extract_song_info(song: Tag):
    try:
        number = song.find(class_='section-number').text
        artist = song.find(class_='section-artist').text
        title = song.find(class_='section-title').text
        time = song.find(class_='section-time')['value']
        result = SongInfo(number, artist, title, time)
        print(f'Extracted {result}')
        return result
    except AttributeError:
        print(f'Error with item {song}')
        return None

def parse_table(input_html: str):
    soup = bs4.BeautifulSoup(input_html, features="html5lib")
    songs = [row for row in soup.find_all(class_="section-row")]
    return [x for x in [extract_song_info(song) for song in songs] if x is not None]


def save_to_csv(file_name: str, songs: List[SongInfo]):
    with open(file_name, 'w', encoding='utf-8') as fo:
        writer = csv.writer(fo)
        for song in songs:
            writer.writerow(song)


if __name__=="__main__":
    if len(sys.argv) != 3:
        print('Usage: extractor.py [input_html_file] [output_csv_file]')
    html = load_html(sys.argv[1])
    songs = parse_table(html)
    save_to_csv(sys.argv[2], songs)
    print(f'Saved to {sys.argv[2]} successfully - Done!')

With it and the HTML pages, which I almost got with Chromedriver, again thanks to Jon, but I couldn’t be bothered to sort out the cookies, etc. I quickly wrote a quick /dirty bash script and fired up a terminal.

#!/bin/bash
./extractor.py $1.html $1.csv
# Verify
echo Details for $1

I thought about modifying Jon’s script to generate the cue files directly bypassing the csv file but decided I should just get them all done. Because I still need to get funkwhale going.

I did notice the edit page doesn’t include genre or the year of the mix, but I can live with that, for now… Scraping web pages is certainly a throw back but its better solution that what I originally was thinking.

This will teach me to sort out my own house of data!

Share this post:

  • Click to email this to a friend (Opens in new window)
  • Click to share on Twitter (Opens in new window)
  • Click to share on Facebook (Opens in new window)
  • More
  • Click to print (Opens in new window)
  • Click to share on Pocket (Opens in new window)
  • Click to share on Telegram (Opens in new window)
  • Click to share on LinkedIn (Opens in new window)
  • Click to share on Pinterest (Opens in new window)

Related

Author: Ianforrester

Senior firestarter at BBC R&D, emergent technology expert and serial social geek event organiser. View all posts by Ianforrester

Author IanforresterPosted on March 10, 2020Categories data-and-semantic-webTags cue, data, dataportability, dj, extract, funkwhale, metadata, mix, mixcloud, music

Post navigation

Previous Previous post: New kid on the block Joplin for notes
Next Next post: Cancelling flights to South Korea and Japan

About me

Ian Forrester

Ian Forrester

Senior "Firestarter" Producer at BBC RD, emergent technology expert and serial Manchester social geek event organiser.

Personal Links

  • Cubicgarden.com...
  • Trakt.TV profile
  • Twitter Cubicgarden
  • Ian Twitter
  • My presentations
  • PixelFed

View Full Profile →

Archives

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

Ianforrester

  • Manchester
  • githubGitHub
  • twitterTwitter
  • facebookFacebook
  • microblogMicro.blog
  • flickrFlickr
  • mastodonMastodon
  • youtube.comYouTube
  • linkedin.comLinkedIn
  • slideshare.netSlideShare
  • goodreads.comGoodreads
  • getpocket.comPocket
  • appear.in
  • airbnb.co.uk
  • libre.fm
  • trakt.tvTrakt
  • mastodon.cloudMastodon
  • scholar.social
  • keyserver.ubuntu.comUbuntu

Senior firestarter at BBC R&D, emergent technology expert and serial social geek event organiser.

Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here: Cookie Policy
March 2020
M T W T F S S
 1
2345678
9101112131415
16171819202122
23242526272829
3031  
« Feb   Apr »

Categories

  • aggregator (11)
  • culture-and-politics (702)
  • design-and-ideas (610)
  • home entertainment (88)
  • italic+mixing (189)
  • just-plain-life (707)
    • gratitude diary (34)
  • media-and-expression (441)
  • play-and-games (53)
  • science+theory (42)
  • social-hardware (528)
    • socialware-offline (253)
    • socialware-online (225)
  • technology (624)
    • home entertainment (13)
    • mobile-technology (300)
    • technology-and-computing (205)
  • Uncategorized (190)
  • xml and web 2.0 (391)
    • data-and-semantic-web (166)

Pages

  • About
  • Bookmarks
  • Contact me
  • Gadgets
  • Gratitude diary
  • Manchester Map
  • Microblogging
  • Mixes
  • Photos
  • Professional networking
  • Social media
  • Socialise
  • Subscriptions
  • Visual media

Recent Posts

  • Diabolo poses captured May 25, 2022
  • The excesses of Berlin’s club culture? May 23, 2022
  • WebMix: Webmonetization + Dj mixes for the next internet May 23, 2022
  • Covid: It was bound to happen May 17, 2022
  • Mozilla/BBC Ethical Dilemma Cafe Manchester May 8, 2022
  • Go watch: Everything, Everywhere, All At Once May 5, 2022
  • Public Service Internet monthly newsletter (May 2022) May 2, 2022
  • What do you value most in a friendship? May 1, 2022
  • Is BeReal as ethical as it says? April 30, 2022
  • Its going to be a busy few weeks again April 23, 2022
  • Public Service Internet monthly newsletter (April 2022) April 1, 2022
  • Mozilla/BBC Ethical Dilemma Cafe Manchester opens 26-27th April (tickets are available now!) March 30, 2022

Recent Comments

  • Ianforrester on Covid: It was bound to happen
  • cubicgarden on Public Service Internet monthly newsletter (May 2022)
  • coldclimate on Public Service Internet monthly newsletter (May 2022)
  • Ianforrester on Mozilla/BBC Ethical Dilemma Cafe Manchester opens 26-27th April (tickets are available now!)
  • Ianforrester on Mozilla/BBC Ethical Dilemma Cafe Manchester opens 26-27th April (tickets are available now!)
  • Ianforrester on The incidental contact high mix
  • Ianforrester on The Grand Mozilla Festival Web Monetization Experiment is go
  • Brian Butterworth on A rallying calling for distributed rather than decentralised
  • Digital Italic on The incidental contact high mix
  • Ianforrester on Trying out duck duck go’s app tracking protection

Tip jar

Found something useful? Feel free to tip me if you like...

  • Tip me with Monzo
  • Donate bitcoins
  • Tip me on Paypal

Tags

  • #Blacklivesmatter
  • android
  • apple
  • bbc
  • blacklivesmatter
  • conference
  • covid19
  • dance
  • data
  • date
  • dating
  • diabolo
  • dj
  • facebook
  • film
  • geek
  • geekdinner
  • google
  • internet
  • life
  • london
  • love
  • manchester
  • media
  • mix
  • mobile
  • mozfest
  • mozilla
  • music
  • okcupid
  • onlinedating
  • pacemaker
  • pacemakerdevice
  • perceptivemedia
  • podcast
  • science
  • sex
  • social
  • trance
  • tv
  • twitter
  • ubuntu
  • uk
  • video
  • xbmc

About me

I can be found microblogging on Twitter, Mastodon.cloud, Mastodon Scholar.social I upload media to Flickr, Pixelfed, You Tube, Mixcloud and Slideshare I generate metadata on Trakt TV and Diigo. Network professionally at Linkedin, use keybase, write code on github and dabble more socially on Facebook
  • githubGitHub
  • twitterTwitter
  • facebookFacebook
  • microblogMicro.blog
  • flickrFlickr
  • mastodonMastodon
  • youtube.comYouTube
  • linkedin.comLinkedIn
  • slideshare.netSlideShare
  • goodreads.comGoodreads
  • getpocket.comPocket
  • appear.in
  • airbnb.co.uk
  • libre.fm
  • trakt.tvTrakt
  • mastodon.cloudMastodon
  • scholar.social
  • keyserver.ubuntu.comUbuntu

Green status

This website is hosted Green - checked by thegreenwebfoundation.org

Spam Blocked

109,350 spam blocked by Akismet
Cubicgarden.com… Proudly powered by WordPress
loading Cancel
Post was not sent - check your email addresses!
Email check failed, please try again
Sorry, your blog cannot share posts by email.