Code Monkey home page Code Monkey logo

data_science's Introduction

Data Science

This repository contains various data science assignments and projects which I enjoyed solving/working on.

  • In ./car_listings, I coded a solution for two problems, classification and regression, based on real-world car listing dataset
    • predict product tier class (i.e., Basic, Premium, Plus) using Random Forest Classifier
    • predict detail views using Gradient Boosting Regressor
  • In ./lstm_seq2seq, there is an example of a character-level recurrent sequence-to-sequence model for translating short English sentences to French, using TensorFlow v2.2.0 and Keras v2.3.0 and ran on a GPU machine, GeForce RTX 2080
  • In ./restaurant_reviews, I coded a simple solution for a simple sentiment analysis NLP classification task, the goal is to predict whether a customer's review is positive or negative, I use nltk for preprocessing the reviews, and my solution offers a bucket of different classifiers, e.g., GaussianNB, RandomForestClassifier, LogisticRegression
  • In ./movie_recommendations I use the item-based collaborative filtering technique to recommend movies to a user based on his/her favourite movie list. I got the data from MovieLens Latest Datasets. This movie rating data was collected between 1996 and 2018 from 610 users, it contains 100836 ratings and 9742 movies.
  • In ./topic_modeling_LDA, I apply latent Dirichlet allocation (LDA) from Gensim python library to 20k Wikipedia abstracts. LDA is a three-level hierarchical Bayesian model. In the context of topic modeling, each Wikipedia abstract is modeled as a mixture over a set of topics, each topic is modeled as a mixture of various word probabilities and different topic probabilities represent different documents.
  • In ./response_or_noResponse.ipynb, there's a binary classification task that predicts whether a person will respond/not respond to a direct mail advertisements based on various features like age, income, lifestyle, etc. I did model benchmarking using various out-of-the-box classifiers from Scikit-learn, Random Forest shows superior performance. I did model tuning by randomly searching for the best parameters.

Issue viewing ipynb in github

Sometimes github fails to render the jupyter notebooks, .ipynb files. If you encounter such an issue use nbviewer online to view the .ipynb file, you don't need to install anything.

This workaround solution was suggested in this post.

data_science's People

Contributors

israa-alqassem avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.