Music emotion classifiers based on lyrics using LDA, SVM, and AdaBoost
- Install git-lfs (Follow instructions on https://git-lfs.github.com/)
- Clone repo
- Pull using git-lfs
- Create virtual env (Optional)
- Run 'pip install -r requirements.txt'
Original using dataset found in 'data/spotify/' and 'data/deezer/' directories, which do not include lyrics. Lyrics for each song was scraped from Genius using tweaked-version of lyricsgenius (with slight modification to search_song api). Upon completion, a gen_{dataset_name}_data.csv and gen_{dataset_name}_error_log.txt is created for each dataset.
Running 'python shared/gen_dataset.py' will initiate scraping for lyrics for each song in the datasets. The result of api call is verified quickly by checking that song title & artist names found online match with dataset values before being store.
Columns
- song - song title (string)
- artist - artist name (string)
- valence - valence rating (numeric)
- arousal - arousal rating (numeric)
- lyrics - lyrics (string)
- found_song - song title found online (string)
- found_artist - artist name found online (string)
Running 'python shared/clean_dataset.py' will initiate some basic cleaning/preprocessing of datasets generated by shared/gen_dataset.py. Songs are cleaned based on lyric length, lyric tags (e.g. [VERSE 1]), language (English only using langdetect), word count limit, and unique word count threshold. Following this, generation of emotion class label ('y') takes place based on the quadrants of valence-arousal space (see fig. below).
Columns
- song - song title (string)
- artist - artist name (string)
- valence - valence rating (numeric)
- arousal - arousal rating (numeric)
- lyrics - lyrics (string)
- found_song - song title found online (string)
- found_artist - artist name found online (string)
- y - emotion class label from 1-4 (numeric)