Code Monkey home page Code Monkey logo

once-a-ds-n00b's Introduction

About: Everyone was a n00b once ;-)

My first repo on GitHub - a portfolio of some data science projects I undertook as part of the Post Graduate Program in Machine and Deep Learning at IIIT Bangalore. Here's a link to the programme in its current avatar (it's Oct '21 now, and the program will likely continue evolving. So, ping me if the link is broken or you need further information!)

Spotlight

Inside this repo, you'll find evidence of my competence in the following skill set:

  • Programming in Python - NumPy, SciKit Learn, Tensorflow (Keras)
  • Data Visualisation and Exploratory Data Analysis
  • Application of a number of Machine and Deep Learning algorithms to complex datasets
  • Interpretation of Model Statistics

Project (Folder) Descriptions

  1. Exploratory Data Analysis on Lending Club applicants/loans data

    • In this project, I visualise and explore Lending Club loans data to uncover relationships between borrower and loan attributes and the likelihood of the loan going bad (remaining unpaid to the lender)
    • The files inside include:
      • The data file (loans.csv)
      • A data dictionary file (Data_Dictionary.xlsx)
      • A Jupyter notebook with visualisations and analysis, and
      • A PDF presentation with key insights
  2. Regression - Car Price Prediction

    • In this project, a new car manufacturer wants to enter a market with multiple existing players and wishes to understand how best to price its cars. Made available is data covering the price and features of several different cars in the market
    • The files inside include:
      • The data file (CarPrice_Assignment.csv),
      • A data dictionary file (Data Dictionary - carprices.xlsx), and
      • A Jupyter notebook covering some intricate exploratory data analysis and a suite of OLS regression models
  3. Classification - Telecom Churn Prediction and Factor Analysis

    • In this project, a telecom services providers wants to predict which of its customers are likely to churn and also understand what factors influence churn, so as to take preventive action. The service provider has data covering the behaviour of newly acquired customers over a four month period
    • A particularly important consideration: the dataset in this project consists of 99,999 observations of 226 variables
    • For classification, I reduce the original dataset into a subset of principle components (PCA), take care of class imbalance (via under-sampling), and run a variety of different classification algorithms on the transformed dataset:
      • Logistic Regression,
      • Random Forest,
      • Extreme Gradient Boosting (XGBoost),
      • Linear and Non-Linear Support Vector Machines
    • For factor analysis, I develop a separate logistic regression model using pre-transformed, human-readable data to enable explainability
    • Apart from the Jupyter notebook that covers the above-mentioned tasks, files included in this folder are:
      • The data file (telecom_churn_data.csv), and
      • A data dictionary file (Data+Dictionary-+Telecom+Churn+Case+Study.xlsx)
  4. Multi-Class Classification of Hand Gestures in Videos

    • In this project, a manufacturer of Smart TVs wishes to introduce a new product feature whereby a TV sensor captures and processes its user's hand gestures in real time for certain remote control tasks. Given a labelled training dataset consisting of over 650 videos (each video consisting of a sequence of 30 images) of human hand gestures, my task was to develop a model that learns to predict the correct label corresponding to a new hand gesture video
    • I was able to train a model delivering 67% accuracy on a validation dataset consisting of 100 videos using a 3D-Convolutional Neural Network (3D-CNN)
    • The files inside this folder include:
      • A sub-folder consisting of both training and validation videos and labels,
      • A Jupyter notebook covering data (image) pre-processing steps, batch generation steps, model (3D-CNN) construction and fitting steps (Keras),
      • A copy of the model with best hyperparameters I encountered (best_model.h5), and
      • Some notes on architectural decisions (some_notes.pdf)

once-a-ds-n00b's People

Contributors

shashiniyer avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.