Code Monkey home page Code Monkey logo

hit-song-prediction's Introduction

Hit Song Prediction

Code for practical experiments in Hit Song Prediction, including scripts that:

  • Scrape a complete Billboard Hot 100 song dataset.
  • Match Billboard songs with Spotify tracks.
  • Extract audio features from a list of Spotify tracks.
  • Train various learning algorithms on the task of Hit Song Prediction using Spotify's audio features.

Results

These results were obtained with the hit songs from the Billboard Hot 100 charts from 2000 onward, and a random selection of songs from Spotify. Both csv files are in the ./datasets directory. Results can differ when using different hit/non-hit datasets.

Classifier Accuracy Precision Recall ROC-AP ROC-AUC
Logistic Regression 0.819 0.783 0.880 0.832 0.877
Random Forest 0.813 0.772 0.887 0.839 0.879
Neural network (MLP) 0.818 0.784 0.876 0.834 0.877
CLMR model
Feature importances extracted from Random Forest model

Data collection

Two datasets are required to train: a "hit" and "non-hit" song dataset. The hit songs are scraped from the Billboard Hot 100 charts. Subsequently, the corresponding Spotify track is matched against the Billboard songs, and lastly their audio features are computed. While all files are already compiled in the ./datasets folder, the following commands perform these operations:

# Downloads all songs from the Billboard Hot 100 chart:
python get_charts.py

# Matches the Billboard songs with Spotify track ID's using fuzzy string matching of the track and artist name:
python spotify_matcher.py

# Calculates audio features from Spotify on the matched tracks:
python spotify_features.py

Eventually, this yields a spotify_billboard_features.csv file containing all the Spotify features and Billboard data, which can be used as input for the learning algorithm.

Usage

To start training using a hit- and non-hit song dataset in the ./datasets folder, use:

python main.py

Arguments

usage: main.py [-h] [--seed SEED] [--hits HITS] [--nonhits NONHITS]
               [--classifier {logistic_regression,random_forest,neural_network}]
               [--holdout_year HOLDOUT_YEAR] [--test_song TEST_SONG]

optional arguments:
  -h, --help            show this help message and exit
  --seed SEED           Random seed
  --hits HITS           CSV file containing features of hit songs
  --nonhits NONHITS     CSV file containing features of non-hit songs
  --classifier {logistic_regression,random_forest,neural_network}
                        Classifier to use
  --holdout_year HOLDOUT_YEAR
                        Which year's hit songs to withold for testing
  --test_song TEST_SONG
                        Spotify URI to test hit song potential

hit-song-prediction's People

Contributors

spijkervet avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.