mayibongweshongwe,Mayibongwe Shongwe,Mayibongwe_MS,github

applied-data-science-capstone-project

This is the final project in completion of the IBM Data Science Professional Certificate.

azure-utah-accidents-prediction

Azure Machine Learning Service for car accidents prediction

crowdednessprediction-deeplearningdtu-2020

Crowdedness prediction in public transport under Covid-19 using Smartcard demand-data (Rejsekort-data). In order to avoid full busses and trains, Movia, the public transport authority of Eastern Denmark, has a need for predicting how many passengers will be in vehicles and use this to adjust capacities (add/move vehicles) and target information under the Covid-19 pandemic. With the use of Rejsekort data (tap-ins / tap-outs) you will predict the passenger demand of tomorrow, so social distancing can be kept in the public transport.

customer-churn-analysis

Implementation of Decision Tree Classifier, Esemble Learning, Association Rule Mining and Clustering models(Kmodes & Kprototypes) for Customer attrition analysis of telecommunication company to identify the cause and conditions of the churn.

customer-survival-analysis-and-churn-prediction

In this project, I have utilized survival analysis models to see how the likelihood of the customer churn changes over time and to calculate customer LTV. I have also implemented the Random Forest model to predict if a customer is going to churn and deployed a model using the flask web app.

expresso-customer-churn-prediction

This repository explains how to predict customer churn. An Hackathon Organized by Data Science Nigeria(DSN-AI) to help Expresso predict customer Churn. My 2nd place solution, log_loss of 0.246675. I've also added a section in the notebook to get a score of 0.246643, which could be the unofficial 1st place solution.

fb_comments_harvester

Scrapes all comments on all posts on a Facebook page.

gtfspy

Public transport network analysis using Python 🚊🚇🚃🚌🛳️🚡🚠🚞

ibm_data_science_capstone

Coursera IBM Data Science Certification Capstone Course

job-a-thon_nov-2021-

Approach Problem Statement: You are working as a data scientist with HR Department of a large insurance company focused on sales team attrition. Insurance sales teams help insurance companies generate new business by contacting potential customers and selling one or more types of insurance. The department generally sees high attrition and thus staffing becomes a crucial aspect. To aid staffing, you are provided with the monthly information for a segment of employees for 2016 and 2017 and tasked to predict whether a current employee will be leaving the organization in the upcoming two quarters (01 Jan 2018 - 01 July 2018) or not, given: 1.Demographics of the employee (city, age, gender etc.) 2.Tenure information (joining date, Last Date) 3.Historical data regarding the performance of the employee (Quarterly rating, Monthly business acquired, designation, salary) As the objective was to predict if an employee will leave the organization in the upcoming two quarters, the target variable was taken such that if an employee leaves the organization within 180 days of review it was taken was 1 and 0 otherwise i.e., if the last working day is 25-11-2017 and a review was conducted on 01-05-2017(208 days prior), target would be 0 and for the next review conducted on 01-06-2017(177 days prior), the target would be 1. The training data was taken only till 01-08-2017 as a full 180 days was required for prediction. The predictions had to be done at review level for each employee otherwise there would not be sufficient data and the changes in employee performance /behaviour might be difficult to catch if data was minimized to one row per employee. Data Pre-Processing/Feature Engineering: In the dataset, there are 13 features which are Emp_ID, Reporting Date, Age, Gender,City,Education,Salary,DateofJoining,LastWorkingDate,Joining_Designation, Designation, Total_Business_Value, Quarterly_Rating. First step in Building a Model is to understand the Data-Set, and after understanding I came to know that, there are ‘2200’ Duplicate values present in the ‘Emp_ID’ column (primary key). After that I’d Drop all the Duplicate values on the basis of last ‘Reporting Date’, and we get the Distinct ‘Emp_ID’ column. The Next step is that the target variable is not specifically mentioned in the train data. For constructing the target variable as shown in the definition, one should first look at the ‘LastWorkingDate’ column. Wherever the column has null values, that means the employee is continuing his/her work at the organization at least in the next year. Wherever any date record is appearing, that means the employee has left the organization on that particular date. So as per definition, we will put 0 where ‘LastWorkingDate’ column is null and 1 where ‘LastWorkingDate’ column has a date. Next, we take the age of that employee the last it was reported. Gender and City were taken from the dataset given. Education and Salary were also taken the last time it was reported. Joining Designation is taken as it is from the dataset. Designation is the designation of the employee at the last time it was reported. Total Business Value is the sum of the Total Business Value acquired by the employee. Quarterly_Rating is the rating the employee was given the last time it was reported. Model Building: Now, before building the model, the categorical feature ‘Gender’, ‘Education-Level’, ‘City’, ‘Quarterly Rating’ was One-hot encoded. All the numerical features were scaled using StandardScalar. Then search for the parameter values like ‘n-estimators’ and ‘max-depth’ which gives the best f1-score using GridSearchCV. Model Selection: Before finalizing on Decision Tree; few classification models like LogisticRegression, KNN, SVM, XGBoost and GradientBoost were also applied on the dataset. XGBoost led to overfitting the data. SVM, Gradient Boost and Random Forest performed well on the data. Since Decision Tree gave a good f1-score = 0.6966, this model was selected to predict the employee attrition.

loan-default-prediction---ltfs-data-science-finhack

In this FinHack, you will develop a model for our most common but real challenge ‘Loan Default Prediction’

mayibongweshongwe

Profile Readme

nlp_notebooks

The resources stored in this repository cover data science topics applied Natural Language Processing

pbpython

Code, Notebooks and Examples from Practical Business Python

powerapps

predicting_winning_teams

This is the code for "Predicting the Winning Team with Machine Learning" by Siraj Raval on Youtube

predictingemployeeattritionusingrandomforest

pyspark-examples

Pyspark RDD, DataFrame and Dataset Examples in Python language

road-accidents-prediction-and-classification

Final Year Project on Road Accident Prediction using user's Location,weather conditions by applying machine Learning concepts.

roadaccidentspods

Principles of Data Science coursework project investigating relationships between environmental conditions and traffic accidents in the UK.

sa_household_survey

scikit-learn-mooc

traffic-simulation

Traffic simulation using Pygame

webscrapingfacebook

Download all your personal images + tagged images from Facebook with Selenium

webscrapinginstagram

Multiple Notebooks for Web Scraping Instagram with Selenium

worldquant-university-applied-data-science

wqu

All Python Code for World Quant University Master degree.

wqu-ds-unit-2

This repo contains all the files material releated to WorldQuant University's Data Science Summer 2020 Session Unit 2: Machine Learning and Statistical Analysis

wqu_project_s202009

Collaborative project for WQU Unit I students in 2020-09

mayibongweshongwe Goto Github PK

Hi there 👋

Mayibongwe Shongwe's Projects

Recommend Projects

Recommend Topics

Recommend Org