Home -> Projects -> 104.3 Song title/artist fetcher
I normally stream my music from Pandora using pianobar but this doesn’t work when I don’t have interent or Pandora just desides to stop streaming music to me. This second problem has been happening fairly frequently recently so I’ve been looking for a good way to store all this music locally. There’s a fork of pianobar called pianobar-save which does what it sounds like, but the fork is rather old and I don’t want to have to deal with merge conflicts rebasing it myself. So I’ve been looking for a good source of song names and a means of playing them locally, and just a couple days ago I came across the 104.3 radio station in Pasadena and decided I liked that music.
104.3 advertises the song titles and artists for the most recent (or next?) 10 songs on their playlist. That’s not a lot, and would be tedius to record a large list by hand, but it’s no problem if I’m writing a script. There’s nothing special about this web scraper, but I figured it’s worth writing a project doc to show how easy it is to put a couple tools together and play music locally.
cvlc -ZI curses . to shuffle over all songs in the current directory)! Or if you’re not such a command line nut like me you can use some other music player.The source is on Github
I suppose since I said this is easy I should also say how it’s done. I start my scraper scripts with 3 windows open: chromium with the website I’m scraping (and chrome inspector for browsing the source), a terminal with my text editor, and another terminal with ipython. When I write code I first test it live in ipython and make sure it does what I want and then copy it into my text editor.
Surprisingly, this system for writing web scrapers doesn’t even require internet access – I wrote this one while on a flight! I just opened a couple pages in my browser before hand (the page I was scraping + syntax references for BeautifulSoup and urllib) and that was enough to write the scraper offline.
import everything (BeautifulSoup and urllib.request). Note that I’m using python3 – urllib changed syntax slightly between python2 and python3.
Download the page html = urllib.request.urlopen(url).read(). Or since I don’t have internet access I just write that in my text editor. To test in ipython I copy the cached html from my browser and run html = """source copied from browser""". Error checking on the http request would be good form, but I don’t care that much about making this script robust.
Use chrome inspector to find the html containers of the content you care about. In this case it’s relatively simple – the playlist is in an ordered list (<ol>) with id playlist-items and each entry has its own list item (<li>). playlist_ols = soup.find_all('ol', 'playlist-items') extracts the playlist node (test in ipython, copy into text editor). Add some error checking, then playlist_ol = playlist_ols[0]
Now I’ve got a reference to the container of all the data I care about. playlist_lis = playlist_ol.find_all('li') gives me each entry. The title is stored in a <h3> within the <li> so title = li.find_all('h3')[0].text and the artist is in a link with class track-artist artist = li.find_all('a', {'class':'track-artist'})[0].text. I test this code in ipython and copy it to the text editor.
That’s all. Loop over the playlist entries, accumulate the results, and print them out. Scraper complete!
Bash scripts are less intuitive, but if you practice enough they get pretty easy too. Check out the README file on Github – I started by writing each of those bash snippits and then pieced them all together into the final script.
I added a script that displays lyrics for the current song (and updates when the song changes) to the repo. Check out this project page.