MEC

Music emotion classifiers based on lyrics using LDA, SVM, and AdaBoost

Setup instructions

Install git-lfs (Follow instructions on https://git-lfs.github.com/)
Clone repo
Pull using git-lfs
Create virtual env (Optional)
Run 'pip install -r requirements.txt'

Dataset

Original using dataset found in 'data/spotify/' and 'data/deezer/' directories, which do not include lyrics. Lyrics for each song was scraped from Genius using tweaked-version of lyricsgenius (with slight modification to search_song api). Upon completion, a gen_{dataset_name}_data.csv and gen_{dataset_name}_error_log.txt is created for each dataset.

gen_dataset.py

Running 'python shared/gen_dataset.py' will initiate scraping for lyrics for each song in the datasets. The result of api call is verified quickly by checking that song title & artist names found online match with dataset values before being store.

Columns

song - song title (string)
artist - artist name (string)
valence - valence rating (numeric)
arousal - arousal rating (numeric)
lyrics - lyrics (string)
found_song - song title found online (string)
found_artist - artist name found online (string)

clean_dataset.py

Running 'python shared/clean_dataset.py' will initiate some basic cleaning/preprocessing of datasets generated by shared/gen_dataset.py. Songs are cleaned based on lyric length, lyric tags (e.g. [VERSE 1]), language (English only using langdetect), word count limit, and unique word count threshold. Following this, generation of emotion class label ('y') takes place based on the quadrants of valence-arousal space (see fig. below).

Columns

song - song title (string)
artist - artist name (string)
valence - valence rating (numeric)
arousal - arousal rating (numeric)
lyrics - lyrics (string)
found_song - song title found online (string)
found_artist - artist name found online (string)
y - emotion class label from 1-4 (numeric)

rlzh / mec Goto Github PK

mec's Introduction

MEC

Setup instructions

Dataset

gen_dataset.py

clean_dataset.py

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent