Code Monkey home page Code Monkey logo

Hi there 👋

  • 🔭 I’m currently working on building ML web Apps using Streamlit
  • 🌱 I’m currently learning Web Development

Mayibongwe Shongwe's Projects

crowdednessprediction-deeplearningdtu-2020 icon crowdednessprediction-deeplearningdtu-2020

Crowdedness prediction in public transport under Covid-19 using Smartcard demand-data (Rejsekort-data). In order to avoid full busses and trains, Movia, the public transport authority of Eastern Denmark, has a need for predicting how many passengers will be in vehicles and use this to adjust capacities (add/move vehicles) and target information under the Covid-19 pandemic. With the use of Rejsekort data (tap-ins / tap-outs) you will predict the passenger demand of tomorrow, so social distancing can be kept in the public transport.

customer-churn-analysis icon customer-churn-analysis

Implementation of Decision Tree Classifier, Esemble Learning, Association Rule Mining and Clustering models(Kmodes & Kprototypes) for Customer attrition analysis of telecommunication company to identify the cause and conditions of the churn.

customer-survival-analysis-and-churn-prediction icon customer-survival-analysis-and-churn-prediction

In this project, I have utilized survival analysis models to see how the likelihood of the customer churn changes over time and to calculate customer LTV. I have also implemented the Random Forest model to predict if a customer is going to churn and deployed a model using the flask web app.

expresso-customer-churn-prediction icon expresso-customer-churn-prediction

This repository explains how to predict customer churn. An Hackathon Organized by Data Science Nigeria(DSN-AI) to help Expresso predict customer Churn. My 2nd place solution, log_loss of 0.246675. I've also added a section in the notebook to get a score of 0.246643, which could be the unofficial 1st place solution.

gtfspy icon gtfspy

Public transport network analysis using Python 🚊🚇🚃🚌🛳️🚡🚠🚞

job-a-thon_nov-2021- icon job-a-thon_nov-2021-

Approach Problem Statement: You are working as a data scientist with HR Department of a large insurance company focused on sales team attrition. Insurance sales teams help insurance companies generate new business by contacting potential customers and selling one or more types of insurance. The department generally sees high attrition and thus staffing becomes a crucial aspect. To aid staffing, you are provided with the monthly information for a segment of employees for 2016 and 2017 and tasked to predict whether a current employee will be leaving the organization in the upcoming two quarters (01 Jan 2018 - 01 July 2018) or not, given: 1.Demographics of the employee (city, age, gender etc.) 2.Tenure information (joining date, Last Date) 3.Historical data regarding the performance of the employee (Quarterly rating, Monthly business acquired, designation, salary) As the objective was to predict if an employee will leave the organization in the upcoming two quarters, the target variable was taken such that if an employee leaves the organization within 180 days of review it was taken was 1 and 0 otherwise i.e., if the last working day is 25-11-2017 and a review was conducted on 01-05-2017(208 days prior), target would be 0 and for the next review conducted on 01-06-2017(177 days prior), the target would be 1. The training data was taken only till 01-08-2017 as a full 180 days was required for prediction. The predictions had to be done at review level for each employee otherwise there would not be sufficient data and the changes in employee performance /behaviour might be difficult to catch if data was minimized to one row per employee. Data Pre-Processing/Feature Engineering: In the dataset, there are 13 features which are Emp_ID, Reporting Date, Age, Gender,City,Education,Salary,DateofJoining,LastWorkingDate,Joining_Designation, Designation, Total_Business_Value, Quarterly_Rating. First step in Building a Model is to understand the Data-Set, and after understanding I came to know that, there are ‘2200’ Duplicate values present in the ‘Emp_ID’ column (primary key). After that I’d Drop all the Duplicate values on the basis of last ‘Reporting Date’, and we get the Distinct ‘Emp_ID’ column. The Next step is that the target variable is not specifically mentioned in the train data. For constructing the target variable as shown in the definition, one should first look at the ‘LastWorkingDate’ column. Wherever the column has null values, that means the employee is continuing his/her work at the organization at least in the next year. Wherever any date record is appearing, that means the employee has left the organization on that particular date. So as per definition, we will put 0 where ‘LastWorkingDate’ column is null and 1 where ‘LastWorkingDate’ column has a date. Next, we take the age of that employee the last it was reported. Gender and City were taken from the dataset given. Education and Salary were also taken the last time it was reported. Joining Designation is taken as it is from the dataset. Designation is the designation of the employee at the last time it was reported. Total Business Value is the sum of the Total Business Value acquired by the employee. Quarterly_Rating is the rating the employee was given the last time it was reported. Model Building: Now, before building the model, the categorical feature ‘Gender’, ‘Education-Level’, ‘City’, ‘Quarterly Rating’ was One-hot encoded. All the numerical features were scaled using StandardScalar. Then search for the parameter values like ‘n-estimators’ and ‘max-depth’ which gives the best f1-score using GridSearchCV. Model Selection: Before finalizing on Decision Tree; few classification models like LogisticRegression, KNN, SVM, XGBoost and GradientBoost were also applied on the dataset. XGBoost led to overfitting the data. SVM, Gradient Boost and Random Forest performed well on the data. Since Decision Tree gave a good f1-score = 0.6966, this model was selected to predict the employee attrition.

nlp_notebooks icon nlp_notebooks

The resources stored in this repository cover data science topics applied Natural Language Processing

pbpython icon pbpython

Code, Notebooks and Examples from Practical Business Python

roadaccidentspods icon roadaccidentspods

Principles of Data Science coursework project investigating relationships between environmental conditions and traffic accidents in the UK.

wqu icon wqu

All Python Code for World Quant University Master degree.

wqu-ds-unit-2 icon wqu-ds-unit-2

This repo contains all the files material releated to WorldQuant University's Data Science Summer 2020 Session Unit 2: Machine Learning and Statistical Analysis

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.