Spotify Song Recommender

This system classifies +90.000 songs and recommends a similar record to the one given.

Introduction

In this project, I am working as a data analyst for a multicultural website. I have been tasked with utilizing machine learning to build a song recommender system. By leveraging unsupervised learning techniques, we were able to cluster over 90,000 songs based on their audio features. This system operates in two ways: 1) if the input song is currently in the top-100 charts, the recommender will suggest a similar song from the same artist or genre from within the top-100; 2) if the input song is not in the top-100 charts, the recommender will obtain its audio features and suggest a similar song from the same cluster. The ultimate goal is to provide users with personalized song recommendations.

You can check the code here.

Materials and Methods

In order to provide a comprehensive analysis of the music landscape, we took a methodical approach by extracting data from multiple countries TOP-100 song charts and utilizing web scraping to identify unique songs. Additionally, we extracted audio features from Spotify’s API, including parameters such as danceability, energy, key, loudness, mode, speechiness, acousticness, instrumentalness, liveness, valence, and tempo.

The dataset was constructed by scanning huge playlists, including (Greatest Hits 2020/2022, Greatest Hits 2010/2019, Longest Playlist Ever). We extracted all songs from albums that appeared at least once, regardless of presence within the playlists themselves. This approach allows us to present a more diverse and well-rounded representation of the music industry.

The methods of this study include the followings:

  • Webscrapping
  • Use of Spotify’s API
  • Data wrangling
  • ML: Unsupervised Learning
  • Hyperparameter optimization
  • Data visualization

Results

The results of our song recommender system indicate that the model is functioning as expected in terms of recommending similar songs to the input given. This is evident from the following examples:

Input Recommendation
Ribs (by Lorde)     →      Isabel (by The Wombats)
Listzomania (by Phoenix)     →      Out of Reach (by The Primitives)
After Midnight (by Blink-182)     →      Scribble (by Puppet, Eden Project)
La Persona (by Amaia)     →      It's Love (by Kina Grannis)


However, it is important to note that musical taste is a highly subjective matter and can vary greatly from person to person. Despite this, our model was able to effectively cluster songs based on their audio features and make meaningful recommendations.

References

The TOP-100 song charts were taken from the PopVortex website. The data and audio features for all songs were obtained through the use of Spotify’s API.

License

This is an educational project; therefore, all materials can be used freelly.