Skip to content

Cubicgarden.com…

Thoughts and ideas of a dyslexic designer/developer

Cubicgarden.com…

Tag: extract

The mixcloud metadata take down

I mentioned a while ago how I was slowly migrating away from Mixcloud as their business model is starting to impinge on people listening to my mixes and I’m not so keen on that. I already mentioned trying to get Funkwhale working and using cue files.

While looking through my mixes I noticed I really didn’t do a great job with some of the metadata. While I had spent most of my time adding to the metadata to Mixcloud (not ideal)

In Mixcloud (once logged in) there are some human friendly urls which I was able to grab images from. The key one being the upload edit page – https://www.mixcloud.com/upload/{username}/{mixname}/edit/ for example https://www.mixcloud.com/upload/cubicgarden/follow-me-into-the-fading-moonlight/edit/

Follow me into the moonlight edit page

My plan was to manually copy the times into my newly written cue files but while talking to Jon about it, he said give him 5mins and he could knock up a script to pull the values out of the HTML page. I thought about it before but using XSLT, however noticed there is a lot javascript rendering making things difficult.

Jon’s quick script written was just what I needed.

#!/usr/bin/env python3

import csv
import sys
from collections import namedtuple
from typing import List

import bs4
from bs4 import Tag

SongInfo = namedtuple('SongInfo', ['number', 'artist', 'title', 'time'])

def load_html(filename: str):
    with open(filename, 'r', encoding='utf-8') as fo:
        return ''.join(fo.readlines())

def extract_song_info(song: Tag):
    try:
        number = song.find(class_='section-number').text
        artist = song.find(class_='section-artist').text
        title = song.find(class_='section-title').text
        time = song.find(class_='section-time')['value']
        result = SongInfo(number, artist, title, time)
        print(f'Extracted {result}')
        return result
    except AttributeError:
        print(f'Error with item {song}')
        return None

def parse_table(input_html: str):
    soup = bs4.BeautifulSoup(input_html, features="html5lib")
    songs = [row for row in soup.find_all(class_="section-row")]
    return [x for x in [extract_song_info(song) for song in songs] if x is not None]


def save_to_csv(file_name: str, songs: List[SongInfo]):
    with open(file_name, 'w', encoding='utf-8') as fo:
        writer = csv.writer(fo)
        for song in songs:
            writer.writerow(song)


if __name__=="__main__":
    if len(sys.argv) != 3:
        print('Usage: extractor.py [input_html_file] [output_csv_file]')
    html = load_html(sys.argv[1])
    songs = parse_table(html)
    save_to_csv(sys.argv[2], songs)
    print(f'Saved to {sys.argv[2]} successfully - Done!')

With it and the HTML pages, which I almost got with Chromedriver, again thanks to Jon, but I couldn’t be bothered to sort out the cookies, etc. I quickly wrote a quick /dirty bash script and fired up a terminal.

#!/bin/bash
./extractor.py $1.html $1.csv
# Verify
echo Details for $1

I thought about modifying Jon’s script to generate the cue files directly bypassing the csv file but decided I should just get them all done. Because I still need to get funkwhale going.

I did notice the edit page doesn’t include genre or the year of the mix, but I can live with that, for now… Scraping web pages is certainly a throw back but its better solution that what I originally was thinking.

This will teach me to sort out my own house of data!

Author IanforresterPosted on March 10, 2020Categories data-and-semantic-webTags cue, data, dataportability, dj, extract, funkwhale, metadata, mix, mixcloud, music
September 2025
M T W T F S S
1234567
891011121314
15161718192021
22232425262728
2930  
« Aug    

Categories

  • aggregator (4)
  • culture-and-politics (748)
  • design-and-ideas (657)
  • home entertainment (130)
  • italic+mixing (219)
  • just-plain-life (756)
    • gratitude diary (34)
  • media-and-expression (445)
  • play-and-games (53)
  • science+theory (42)
  • social-hardware (539)
    • socialware-offline (254)
    • socialware-online (236)
  • technology (636)
    • home entertainment (13)
    • mobile-technology (307)
    • technology-and-computing (209)
  • Uncategorized (190)
  • xml and web 2.0 (392)
    • data-and-semantic-web (167)

Pages

  • About
  • Bookmarks
  • Contact
  • Cubicgarden Ltd
  • Mixes
  • Photos
  • User Manual

Recent Posts

  • Public Service Internet monthly newsletter (Sept 2025) September 1, 2025
  • Mydata 2025: Machine readable letter of wishes, the workshop August 28, 2025
  • My sourdough baking experiments so far… August 23, 2025
  • Terminal velocity unhindered by dark clouds mix August 21, 2025
  • A lesson in separation of concerns: Google’s gemini hijacked August 17, 2025
  • What happened to hobbies? August 6, 2025
  • Pipelines and AI? August 4, 2025
  • A Storm in a french grapevine pacemaker device mix August 4, 2025
  • Public Service Internet monthly newsletter (Aug 2025) August 1, 2025
  • An update on where I have been recently (post R&D) July 30, 2025
  • Public Service Internet monthly newsletter (July 2025) July 1, 2025
  • The Aerodynamic connection mix June 29, 2025

Recent Comments

  • Bill, organizer of stuff on Public Service Internet monthly newsletter (Sept 2025)
  • Sco :progress: :flag_mm: on Public Service Internet monthly newsletter (Sept 2025)
  • coldclimate on Public Service Internet monthly newsletter (Sept 2025)
  • Ian Forrester | @cubicgarden on Public Service Internet monthly newsletter (Sept 2025)
  • Andy Piper on Public Service Internet monthly newsletter (Sept 2025)
  • Cumulonimbus on Public Service Internet monthly newsletter (Sept 2025)
  • TinDrum on Public Service Internet monthly newsletter (Sept 2025)
  • kie/qui (planar) on Public Service Internet monthly newsletter (Sept 2025)
  • TinDrum on Public Service Internet monthly newsletter (Sept 2025)
  • Cumulonimbus on Mydata 2025: Machine readable letter of wishes, the workshop

Tip jar

Found something useful? Feel free to tip me if you like...

  • Tip me with Monzo
  • Leave me a tip using Lightning ⚡
  • Donate bitcoins

Tags

  • #Blacklivesmatter
  • ai
  • android
  • apple
  • bbc
  • blacklivesmatter
  • conference
  • covid19
  • dance
  • data
  • dating
  • diabolo
  • dj
  • facebook
  • film
  • geek
  • geekdinner
  • google
  • internet
  • london
  • love
  • manchester
  • media
  • mix
  • mobile
  • mozfest
  • mozilla
  • music
  • okcupid
  • pacemaker
  • pacemakerdevice
  • perceptivemedia
  • podcast
  • privacy
  • publicserviceinternet
  • publicserviceinternetnotes
  • science
  • sex
  • social
  • trance
  • tv
  • twitter
  • uk
  • video
  • xbmc

About me

I can be found microblogging on Twitter, Mastodon Scholar.social, Twit social, Black Twitter, I upload media to Flickr, Pixelfed, You Tube, My mix garden, Mastodon, Mixcloud and Slideshare I generate metadata on Trakt TV and Diigo. Network professionally at Linkedin, use keybase, write code on github and dabble more socially on Facebook
  • githubGitHub
  • twitterTwitter
  • facebookFacebook
  • microblogMicro.blog
  • flickrFlickr
  • mastodonMastodon
  • youtube.comYouTube
  • linkedin.comLinkedIn
  • slideshare.netSlideShare
  • goodreads.comGoodreads
  • getpocket.comPocket
  • appear.in
  • airbnb.co.uk
  • libre.fm
  • trakt.tvTrakt
  • cubicgarden
  • scholar.social
  • keyserver.ubuntu.comUbuntu

Green status

This website is hosted Green - checked by thegreenwebfoundation.org

Spam Blocked

115,773 spam blocked by Akismet
  • RSS – Posts
  • RSS – Comments

Blog Stats

  • 377,448 hits
Cubicgarden.com… Proudly powered by WordPress