Ursinus CS 174: Object Oriented Programming,Spring 2022

Final Project: The Shazam Algorithm for Audio Fingerprinting (100 Points)

Chris Tralie

Overview/Logistics
Starter Code / Utilities
Programming Tasks
Grading Rubric

We have now reached the climax of the course! This assignment is a direct followup to the last assignment on basketball hashing. Now that we have a hash table implementation, we're going to use it for an even more exciting task of quickly identifying a song from a short (and possibly degraded) audio clip of that song. We will be implementing the so-called "Shazam algorithm," which you can read more about in this paper, and which I have broken down with audio examples at this link. In a nutshell, we will be building something called an audio fingerprint, which can be used to identify songs from short clips of audio.

The Design Is Yours!

I am giving you the freedom to design the code any way you want in this assignment, so you should take some time to plan out what you're going to do before you start coding. Think about what classes you're going to write, what entry point programs with mains you'll write to run your code, and how everything will fit together. Your code should be modular, reusable, and well encapsulated.

Summative Learning Objectives

Design and implement code that uses object oriented principles
Use multiple classes together in concert to accomplish a task
Use hash tables to organize and find data quickly
Manipulate 2D arrays in C++
Manage dynamic memory and object references in C++
Implement audio features for content-based music information retrieval

Background

Click here to view background notes on the algorithm you will be implementing.

What to submit

When you are finished, submit a .zip file with your code (not including the database) to canvas, as well as a README file explaining a brief overview of your code files and the design decisions you made.

Suggested Timeline

By Tuesday 4/26, you should be finishing up computing anchors
By Friday 4/29, you should be finishing up computing fingerprints
By Monday 5/2, you should be finished integrating your hash map code from homework 5 into your codebase
By Wednesday 5/3, you should have a data structure setup to accommodate a database where multiple songs have the same fingerprint
By Monday 5/9, you should be at the stage where you are able to compile a list of songs that have fingerprints in common with a query song, and you should be on your way to organizing histograms of timing offsets between your query and the database songs for each song
Final deadline on Friday 5/13!

Starter Code / Utilities

You can obtain the code for this assignment with

git clone --recursive https://github.com/ursinus-cs174-s2022/FinalProject_Shazam.git

You'll have to design all of the code to do the Shazam algorithm yourself, but I've started you off with utilities for loading audio and computing and plotting audio spectrograms. To see how these all work together, have a look at the code in plotSpectrogram.cpp

Audio spectrograms

As an example of an audio spectrogram, let's say we wanted to do an STFT on the example query "despacitoClean.wav" provided with this code, using a window length of 1024, a hop length of 512, and a cutoff of 128 frequency bins. Then we'd run this:

Code has been provided with your assignment to generate spectrograms from audio clips. For example, if we run the following code

It will yield this image

Have a look at plotSpectrogram.cpp for an example of how to load in audio and construct a spectrogram with the provided code.

Debugging by Plotting

I am a big proponent of debugging with visual output. As you're going along and computing anchors and fingerprints, I would highly recommend plotting them on examples to make sure your code is working properly. You can use the SimpleCanvas library that we've been using with all of the other assignments/labs involving images. Documentation for SimpleCanvas can be found at this link. Just note that the spectrograms are actually plotted upside down! So you may have to flip your anchors/fingerprints.

As a snippet to get you started, here's some code that computes a spectrogram on a 1D array samples with N audio samples

Precomputed Fingerprints

The starter code comes with fingerprints from 8 full length songs using the following parameters for the spectrogram

win = 1024
hop = 512
maxBin = 128

and the following parameters for the anchors

freqWin = 8
int timeWin = 6

and the following parameters for the fingerprints

dCenter = 86
width = 50
height = 21

To make sure your queries are compatible with these examples, you should use the same parameters for the spectrograms when computing new fingerprints.

The 8 precomputed songs can be found in the folder Database. They are stored in plain text files with the following conventions:

The first line is the name of the artist
The second line is the name of the song
Every subsequent line contains a fingerprint formatted into a hash code, followed by the window index at which that fingerprint occurs in the song

For example, the first 10 lines of 0.txt look like:

NOTE:

The time index of the queries is taken to be the left anchor of the fingerprint to be consistent with the database I have setup.

Stress Testing

You should start by making sure you can correctly identify the given 10 second clips in example queries among the 8 examples in this database. Once you're confident your code is working well, you can make the problem harder and see if you can identify those same clips among a database of 275 songs, which can be downloaded at this link.

Hash Maps

Code has not been provided for this, but when you're ready to look up queries in the database, you should splice in your hash map code from homework 5 to help you look up fingerprints quickly.

Programming Tasks

Main Program Specification

Your code should culminate in a program called query, which takes two command line arguments:

A path to a query audio file to be identified
A path to a folder containing a database of precomputed fingerprints

Your program should then load in the query audio and fingerprints in the database, and it should output the artist name and song name of the song in the database that is the best match to the query file.

As an example, if you typed in

Your program should output Luis Fonsi ft Daddy Yankee - Despacito

Efficiency

Your program should be very fast on the database of just 8 songs! Even on the database of 275 songs, my code only takes about 10 seconds to load the entire database, and the query is nearly instantaneous. The key to making this work efficiently is to use a hash table. You can use as many buckets as there are unique hash codes, which is roughly 17 million.

Debugging Tools

Of course, you'll probably want to write several other programs to help you test as you go along. Below are a few suggestions:

A program that computes and plots the constellation for a particular audio clip (note that plotting is upside down)
A program that computes and plots the fingerprints for a particular audio clip
A program that parses one of the database example text files and loads its fingerprints into a hash map
A program that prints out information about histograms that you make of the difference between a fingerprint's window index in the database and the fingerprint's window index in the audio.
...other things to help you debug?!

Going Further!

At this point, you have a fully functional basic implementation of audio fingerprinting for content-based music retrieval. This is not required, but you should try it on your own music! See if you can add a song to the database and query it with a clip. Note that the parameters are setup so that the database is expecting single channel audio sampled at 22050hz in the .wav file format. You can use Audacity to do the conversion. Questions to ask yourself as you go include:

How much audio does it seem like you need to do a robust identification?
How noisy or distorted does can the audio be before the identification fails?
Does this work on different versions of the song (e.g. cover songs)? Why do you think that might be?

Beyond that, you can also write code to create clips to fool the Shazam algorithm, like I did in http://www.beepytunes.com/. This is as simple as creating a tune that has short time cosines at every anchor location in the constellation.

Grading Rubric

Below is how I will score your final submissions

To Receive an A

Each item required for a grade of B is included
Classes use information hiding (public/private/protected) and a clear separation between implementation (cpp files) and specification (h files)
The program runs without memory leaks

To Receive a B

Each item required for a grade of C is included
Fingerprints and database song entries are well-encapsulated within class structures
Code has been reused where appropriate, using object oriented paradigms
Code is clean and well-documented

To Receive a C

Each item required for a grade of D is included
The query program returns the correct songs, as per the specification
The query program runs efficiently, as per the specification

To Receive a D

It is possible to build your query program from scratch by typing make in the command line
Your query program successfully computes anchors, fingerprints, and hash codes for the fingerprints
Your query program is able to load all of the fingerprints from a database into a hash map