“This is not a serious academic study. This is an, like, ‘I thought it’d be cool on the Internet [project]…”

– Matt Daniels, in a interview with NPR.

Meet Matthew Daniels – a New York based designer, data scientist and hip-hop head. Also, he’s possibly the coolest guy ever, and I’m personally obsessed with his hip-hop focused research. The sheer detail of his work is amazingly entertaining to explore.

He originally came to my attention when I was perusing Reddit and stumbled upon his hip-hop vocabulary project. Essentially he analyzed the first 35,000 lyrics of various American hip-hop artists and established a hierarchy based on the artists use of varying vocabulary that sparked much conversation – and debate. There were a few surprises, and a few confirmations (like DMX having the lowest vocabulary of those analyzed). Some artists, such as Biggie Smalls, weren’t able to be analyzed, as they didn’t have enough recorded material.

Matthew Daniels: Hip-Hop by the Numbers

I asked Matt how he conceived this insanely interesting concept:

“This was orginally going to be a collaboration with another media company. We were throwing around ideas, and this is the one we landed on. I had been learning a python framework called NLTK, which is for language processing. The first chapter of the manual for NLTK is on measuring the unique number of tokens (words) in a body of text. I just adopted this approach for the hip hop lyrics data set.”

Understandably this level of hip-hop nerdery broke the internet. As Matt explained, “adding quantitative data to lyricism isn’t common, so it was exciting to have this as a discussion point in hip hop. The biggest surprises were that the data validated a lot of my assumptions: artists who were more experimental with their lyrics ended up using a wider vocabulary (e.g., Outkast)”. How do you top a project with this much cool factor?

Matthew Daniels: Hip-Hop by the Numbers

“Producers understand the minutia – the exact note at the exact millisecond that creates an interesting beat. I wish that more people understood this process and gift.”

Matt’s latest project is Sample Stitch, an interactive (responsive) web experience that allows users to recreate beats by Dilla, Kanye and 9th Wonder using all the original elements. He’s broken the songs into individual samples, and assigned all the samples to different keyboard keys. Essentially this allows users to use their keyboards like a classic MPC. The result? A new found appreciation for the difficulty, and intricacy of legit hip-hop production. What inspired this project? Matt explains: “there was this project called Patatap that used a computer keyboard to trigger sounds. I thought that this was genius because a computer keyboard is a native instrument for most people – we can play it naturally in the same way a musician approaches a piano.”

It’s sheer dopeness hasn’t gone unnoticed.

Matthew Daniels: Hip-Hop by the Numbers


That’s just the tip of the iceberg. His other work includes an in depth look at the cultural influence and nuances of Outkast, and historic analysis of the use of the word “Shorty” – and both are must-reads for any real hip-hop head. When talking to Noisy (re: Sample Stitch) Matt explained that he was, bottom line, trying to “break the Internet again“. With this level of ingenuity, I doubt this is the last time that he’ll do just that!

If you have yet to explore his work, be sure to visit his web-site.