Code Monkey home page Code Monkey logo

major_test's Introduction


Hi , I'm Thrishul Kumar

I'm a passionate be a Data Scientist from India



project icon



Hit Predict : Predicting Billboard Hits Using Spotify Data

The Billboard Hot 100 Chart1 remains one of the definitive ways to measure the 🚀success of a popular song. We investigated using machine learning techniques to predict which songs will become Billboard Hot🔥100 Hits.



Dataset - Spotify Data

Data for ~10,000 songs were collected from Bilboard.com and Millions Songs Dataset. Songs were from 1990-2018
Songs were labeled either 0 or 1 based on Bilboard
Audio Features for each song were extracted from the Spotify web API
K-Nearest Neighbour Classifier is used to predict the song's Billboard success

Features of the Song

<img align="center"src="https://img.icons8.com/nolan/30/musical-notes.png"/>

Danceability : Describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity.
Energy : Represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale.
Loudness : An attribute of auditory sensation in terms of which sounds can be ordered on a scale extending from quiet to loud.
Speechiness :  Detects the presence of spoken words in a track.If the speechiness of a song is above 0.66, it is probably made of spoken words, a score between 0.33 and 0.66 is a song that may contain both music and words (e.g. rap music), and a score below 0.33 means the song does not have any speech.
Acousticness :  A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic.
Instrumentalness :  Predicts whether a track contains no vocals. “Ooh” and “aah” sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly “vocal.” The closer the instrumentalness value is to 1.0,the greater likelihood the track contains no vocal content.
Liveness :  This value denotes the probability that the song is recorded with a live audience. According to the official documentation, “a value above 0.8 provides a strong likelihood that the track is live”.
Valence : A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative.
Tempo : Tempo is how fast or slow a piece of music is performed, while rhythm is the placement of sounds in time, in a regular and repeated pattern.
Genre :  A music genre is a conventional category that identifies some pieces of music as belonging to a shared tradition or set of conventions. It is to be distinguished from musical form and musical style.



Exploratory Data Analysis - EDA

  The Billboard Hot 100 Chart1 remains one of the definitive ways to measure the success of a popular song. We investigated using machine learning techniques to predict which songs will become Billboard Hot 100 Hits.


Box-Plot for Release Year

   Capture1

Heat Map to Study Correlation

  Capture2

  • The Heatmap showed that there is good positive correlation between Loudness and Energy. While Acoustiness and Energy are negatively correlated. Also there is negative correlation between Loudness and Acoustiness.

Scatter plot for Loudness and Acousticness

  Capture3

  • Scatter plot shows that loud songs with High Energy makes up on the billboard Top100.

Genre Classification on Billboard Top100

  Capture4

  • Most of the songs on Billboard are of 'Pop' Genre followed by 'Rap'. It may be because songs of these Genre are released mostly. The songs of genre 'Jazz','reggae','alternative','classical','edm' doesnot make up on Billboard.
  • After performing basic EDA we moved to modelling part. For building the model we used Danceability, Energy, Loudness, Speechiness, Acousticness, Instrumentalness,Liveness, Valence, Tempo and Genre as our Feature variables. While Top100 is target variable.

Steps taken to apply KNN algorithm

  1. Balancing the data using SMOTE technique.

  2. Scaling the independent features using StandardScalar().

  3. Splitting the data in train-test split, 80% for training purpose and 20% for testing purpose.

  4. Fitting the train data using KNN algorithm.

  5. Model Evaluation using Confusion matrix and ROC-AUC curve.

  6. Hyper-parameter tuning to obtain optimized scores.

  7. Evaluating the optimized model


Scores without Hyper-Parameter Tuning

   Scores of initial model:

  • Accuracy: 76 %

  • Precision: 77 %

  • Recall: 76 %

  • ROC AUC Score: 76.13 %

Classification Report

  knn_cr

AUC-Curve

  auc_knn

Optimized KNN With Hyperparameter Tuning

  
   Scores of optimized model:

  • Accuracy: 80 %

  • Precision: 82 %

  • Recall: 80 %

  • ROC AUC Score: 79.59 %

Classification report for KNN after Hyperparametyer tuning

  Classification Report

AUC curve obtained for optimized KNN model

  AUC curve



Deployment

Finally deployed the model on Heroku using Flask.

The screenshot of the app and link is attached below.

Link: https://hit-prediction.herokuapp.com/

major_test's People

Contributors

trishulkumar avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.