Code Monkey home page Code Monkey logo

portfolio_project_imperial's Introduction

Music recommendation system

NON-TECHNICAL EXPLANATION

This project is designed to enhance music discovery by recommending songs based on user preferences. Leveraging the expansive Spotify dataset covering tracks from 1921 to 2020, the system offers personalized song suggestions. Whether you input a favorite song or express your musical taste, the algorithm identifies and suggests the top 10 similar songs. This recommendation system aims to make music exploration enjoyable and effortless, providing users with a curated selection that aligns with their unique musical inclinations.

DATA

This project leverages a publicly available dataset from Spotify, enriched further using Spotipy, a Python client for the Spotify Web API. Spotipy facilitates easy data fetching and querying of Spotify's extensive catalog for songs.

Features Description

  1. track_id: The Spotify ID for the track.

  2. artists: Names of the artists who performed the track. Multiple artists are separated by a comma(,).

  3. name: Name of the track.

  4. popularity: Popularity score ranging from 0 to 100, calculated algorithmically based on total plays and recency. Reflects the track's current popularity.

  5. duration_ms: The track length in milliseconds.

  6. explicit: Indicates whether the track has explicit lyrics (true = 1; false = 0).

  7. danceability: Describes the suitability of a track for dancing based on tempo, rhythm stability, beat strength, and overall regularity. Ranges from 0.0 (least danceable) to 1.0 (most danceable).

  8. energy: Represents perceptual intensity and activity, ranging from 0.0 to 1.0. High energy suggests a fast, loud, and noisy track.

  9. key: Represents track's key, mapped to pitches using standard Pitch Class notation. -1 indicates no detected key.

  10. loudness: Overall loudness of a track in decibels (dB).

  11. mode: Indicates the modality (major or minor) of a track, with 1 for major and 0 for minor.

  12. speechiness: Detects the presence of spoken words in a track. Values above 0.66 suggest speech-like recordings, while below 0.33 likely represent music.

  13. acousticness: Confidence measure (0.0 to 1.0) of whether the track is acoustic. Higher values indicate higher confidence in acoustic nature.

  14. instrumentalness: Predicts whether a track contains no vocals. Values closer to 1.0 suggest instrumental content.

  15. liveness: Detects the presence of an audience in the recording. Higher values indicate a higher probability of a live performance.

  16. valence: Measures musical positiveness on a scale from 0.0 to 1.0. High valence indicates a positive mood.

  17. tempo: Estimated tempo of a track in beats per minute (BPM).

  18. release_date: The tracks release year

  19. duration_ms: The duration of a song in a milliseconds.

Citation

Please note that the Spotify dataset and Spotipy library are publicly available resources, and appropriate credit and adherence to their terms of use are recommended. Refer to the official Spotify API documentation and Spotipy documentation for more details.

MODEL

The model employed in this project is a content-based filtering recommender system utilizing cosine distances, which aligns with a content-based recommendation approach. The utilization of cosine distance signifies that the system suggests songs with feature vectors closely resembling the central song, thereby implementing a similarity-based recommendation strategy. This approach enhances the relevance of recommendations by focusing on the inherent characteristics and attributes of the songs, contributing to a more personalized and accurate recommendation system.

HYPERPARAMETER OPTIMSATION

GridSearch is employed to find the optimal hyperparameters for a DecisionTreeRegressor. The hyperparameters considered in the grid search are:

  1. max_depth: Defines the maximum depth of the decision tree.
  2. max_leaf_nodes: Specifies the maximum number of leaf nodes in the tree.
  3. max_features: Determines the maximum number of features considered for splitting a node.
  4. min_samples_split: Sets the minimum number of samples required to split an internal node.
  5. min_samples_leaf: Establishes the minimum number of samples needed to form a leaf node.

The grid search is conducted using a predefined set of values for each hyperparameter. The parameter grid is constructed with various options for each hyperparameter, and the GridSearchCV is employed with 5-fold cross-validation.

The model is evaluated using the best hyperparameters on both the training and test sets. The performance metrics, including training r^2 score, test r^2 score, mean absolute error, and the best parameters from the grid search. This process ensures that the Decision Tree Regressor achieves optimal predictive performance by carefully tuning its key characteristics.

RESULTS

Hyperparameter Optimization using Grid Search

During the hyperparameter tuning process for the DecisionTreeRegressor, an exhaustive GridSearch was performed to identify the optimal set of hyperparameters. The goal was to enhance the model's predictive performance by systematically exploring various combinations of hyperparameter values.

The search space included parameters such as 'max_depth,' 'max_features,' 'max_leaf_nodes,' 'min_samples_leaf,' and 'min_samples_split.' The grid search yielded the following optimal hyperparameter configuration:

{'max_depth': 11, 'max_features': 'auto', 'max_leaf_nodes': 120, 'min_samples_leaf': 1, 'min_samples_split': 2}

These hyperparameters represent the configuration that resulted in the best model performance according to the chosen evaluation metric.

Performance Metrics with Optimized Hyperparameters

After applying the determined optimal hyperparameters to the DecisionTreeRegressor, the model's performance was assessed using key metrics. The following results were obtained:

  • Mean Absolute Error (MAE): 0.06
  • R2 Score (Training Data): 0.80
  • R2 Score (Test Data): 0.79

These metrics provide insights into the model's accuracy, capturing both the absolute error and the explained variance in the training and test datasets. The achieved values showcase the effectiveness of the hyperparameter optimization process in enhancing the overall performance of the DecisionTreeRegressor.

Test data result

Screenshot

Training data result

Screenshot

(CONTACT DETAILS)

GitHub repo

LinkedIn

Email: [email protected]

portfolio_project_imperial's People

Contributors

sengayire avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.