song_title_artist

Home -> Projects -> 104.3 Song title/artist fetcher

Unnecessary background

I normally stream my music from Pandora using pianobar but this doesn’t work when I don’t have interent or Pandora just desides to stop streaming music to me. This second problem has been happening fairly frequently recently so I’ve been looking for a good way to store all this music locally. There’s a fork of pianobar called pianobar-save which does what it sounds like, but the fork is rather old and I don’t want to have to deal with merge conflicts rebasing it myself. So I’ve been looking for a good source of song names and a means of playing them locally, and just a couple days ago I came across the 104.3 radio station in Pasadena and decided I liked that music.

104.3 Song title/artist fetcher

104.3 advertises the song titles and artists for the most recent (or next?) 10 songs on their playlist. That’s not a lot, and would be tedius to record a large list by hand, but it’s no problem if I’m writing a script. There’s nothing special about this web scraper, but I figured it’s worth writing a project doc to show how easy it is to put a couple tools together and play music locally.

Features

fetcher.py uses BeautifulSoup to parse the playlist for song titles and artists. Wrap it with a quick bash script to run it every 30 minutes (appending to a file) and I’ve got a playlist.
youtube-dl lets me perform youtube searches for the song and download the first result (or any number of results). I only save the audio portion.
glyrc fetches song lyrics for a title/artist pair. Actually it does a lot more than just that, but I only use it for lyrics
cvlc plays music from the command line. It even has a nice curses interface (cvlc -ZI curses . to shuffle over all songs in the current directory)! Or if you’re not such a command line nut like me you can use some other music player.

Source on Github

The source is on Github

If it’s so easy, how do you write scrapers like this?

I suppose since I said this is easy I should also say how it’s done. I start my scraper scripts with 3 windows open: chromium with the website I’m scraping (and chrome inspector for browsing the source), a terminal with my text editor, and another terminal with ipython. When I write code I first test it live in ipython and make sure it does what I want and then copy it into my text editor.

Surprisingly, this system for writing web scrapers doesn’t even require internet access – I wrote this one while on a flight! I just opened a couple pages in my browser before hand (the page I was scraping + syntax references for BeautifulSoup and urllib) and that was enough to write the scraper offline.

import everything (BeautifulSoup and urllib.request). Note that I’m using python3 – urllib changed syntax slightly between python2 and python3.
Download the page html = urllib.request.urlopen(url).read(). Or since I don’t have internet access I just write that in my text editor. To test in ipython I copy the cached html from my browser and run html = """source copied from browser""". Error checking on the http request would be good form, but I don’t care that much about making this script robust.
Use chrome inspector to find the html containers of the content you care about. In this case it’s relatively simple – the playlist is in an ordered list (<ol>) with id playlist-items and each entry has its own list item (<li>). playlist_ols = soup.find_all('ol', 'playlist-items') extracts the playlist node (test in ipython, copy into text editor). Add some error checking, then playlist_ol = playlist_ols[0]
Now I’ve got a reference to the container of all the data I care about. playlist_lis = playlist_ol.find_all('li') gives me each entry. The title is stored in a <h3> within the <li> so title = li.find_all('h3')[0].text and the artist is in a link with class track-artist artist = li.find_all('a', {'class':'track-artist'})[0].text. I test this code in ipython and copy it to the text editor.
That’s all. Loop over the playlist entries, accumulate the results, and print them out. Scraper complete!

You’re on a roll – how did you write the bash scripts?

Bash scripts are less intuitive, but if you practice enough they get pretty easy too. Check out the README file on Github – I started by writing each of those bash snippits and then pieced them all together into the final script.

Update: live lyrics

I added a script that displays lyrics for the current song (and updates when the song changes) to the repo. Check out this project page.