mrc03,Raj Mehrotra,github

housing-prices-eda-and-regression-models

The famous Housing Price Advanced Regression competition on Kaggle. The dataset contains of training and testing sets each with about 1.46K rows and 81 features pertaining to a house. I have first performed an exhaustive EDA to identify the underlying trends in the data. I have also removed outliers to make the regression models more robust. Also proper missing values treatment has been done with imputation being done wherever needed. Lastly I have deployed various regression models like Lasso,Ridge etc... from scikit and have also tuned their parameters from the GridSearchCV module. Finally achieved a RMSE of little more than 0.12 which is pretty decent.

ibm-hr-analytics-employee-attrition-performance

The IBM HR Analytics Employee Attrition & Performance dataset from the Kaggle. I have first performed Exploratory Data Analysis on the data using various libraries like pandas,seaborn,matplotlib etc.. Then I have plotted used feature selection techniques like RFE to select the features. The data is then oversampled using the SMOTE technique in order to deal with the imbalanced classes. Also the data is then scaled for better performance. Lastly I have trained many ML models from the scikit-learn library for predictive modelling and compared the performance using Precision, Recall and other metrics.

internship-assignment

interview-prepartion-data-science

kaggle

Codes for Kaggle Competitions

leetcode_company_wise_questions

This is a repository containing the list of company wise questions available on leetcode premium

machine-learning-tutorials

machine learning and deep learning tutorials, articles and other resources

mne-python

MNE : Magnetoencephalography (MEG) and Electroencephalography (EEG) in Python

mnist-digit-recognizer-using-convnet-keras-accuracy-0.9943-

The MNIST DIGIT RECOGNIZER COMPETITION ON KAGGLE. The training dataset consists of 42000 rows each of 784 pixel values thus representing 28 x 28 sized 42000 images of different digits from 0 to 9 . I have trained Convolutional Neural Networks written in Keras to train the model and predicted on the 28000 images of the test dataset, Also achieved 99.43% accuracy on Kaggle with 20 epochs . Also used ImageDataGenerator to augment the training set and avoid overfitting problem .

movie-reviews-nltk-sentiment-analysis-

The Movie Reviews dataset. The dataset is imported from the NLTK libray. It has 1000 positive and 1000 negative reviews. I have first imported the dataset into a pandas data frame which makes it easier to do the processing. The next step is to analyze the (+) and ( - ) reviews. I have also preprocessed the dataset using Lemmatizing and other standard NLP techniques. To extract the features from the text I have used the Tfidf vectorizer from the scikit. Lastly I have used various modellig algos from scikit to train on this data.

neurodsp

Digital signal processing for neural time series.

object-recognition-cifar-10-cnn-keras

The famous CIFAR-10 dataset. The dataset contains of images of different objects like airplane, horse ,ship etc... that needs to be classified. The training set contains of 50000 images of 32*32 pixels each. Similarly the validation set contains 10000 images of 32*32 pixels too. I have used a self laid ConvNet to correctly classify the images into 10 classes each pertaining to one object. I have also used data augmentation using the ImageGenerator class provided in the Keras library to further increase the size of the training set and thus reduce overfitting chances. Finally I have used the ConvNet to make predictions onto the validation set and achieved a decent accuracy of near about 86%.

pokemon-data-exploration-visualization

Pokemon with stats.Data analysis and exploration is performed on the dataset. Visualization is done using the libraries seaborn,matplotlib. Bar plot,box plot,swarm plot,scatter plot,violin plot, heat map etc... were used to analyze the data.

project

The Project is an Android application that displays the level of various gases in the atmosphere. The volume of gases in the atmosphere is stored in an Excel file. The data values stored in an Excel file is updated periodcally with data fetched from the sensors.The application reads the contents of the file and displays the results fetched in the application.

really-awesome-gan

A list of papers on Generative Adversarial (Neural) Networks

red-wine-quality-accuracy-0.9175-

The Red Wine Quality dataset from kaggle. Data is provided of the composition of the wine having different chemicals. I have used pandas to manipulate the data and seaborn to visualize the data. Finally I have made predictions on the wine quality by using various models from the scikit-learn.

sad_project

A blood bank mobile application where the user can register and login. A blood donor can register with the application and earn points. The receiver can search for donors and either call donor or locate him on the Google Maps. The application uses Java , XML and the Firebase API as backend and Google Maps API to locate the donor on the Google Maps.

seizure-detection-tutorials

A series of tutorials teaching the use of Python for epileptic seizure detection on open-source datasets

spooky-author-identification

The notebook on famous Kaggle competition : Spooky Author Identification. The task is to identify the authors from their respective texts or work. I have first cleaned and pre-processed the text using standard NLP techniques like tokenization , stemming or lemmatization , stop-word removal etc.... I have also tried to create some meta features or hand-crafted features based on the author writing pattern. Then I have used the traditional BOW approach with TFIDF Vectorizer and the Count Vectorizer and then deployed ML algos like LogisticRegression and Naive Bayes which are well suited for text data. For me tfidf on count vectorizer gave best results till now ; My submission scored a multi-class log loss of 0.46 on kaggle private LB which is quite decent.

sqlitepractice

the-iris-species-dataset

The famous Iris Species Dataset from Kaggle. I have normalized the features and also seen their distribution. I have also deployed many algos from scikit to predict on the dataset.

tictactoe

Tic Tac Toe is simple tic tac toe game developed on the android platform. The application was developed in just 2 hours for the International Organisation of Software Developers(IOSD) Hackathon.

titanic-survivor-prediction

The Titanic: Machine Learning from Disaster competiton. With data being provided of varoius passengers traveling on the ship I have used libraries like numpy,pandas to manipulate , explore and analyze the data and libraries like matplotlib and seaborn to visualise the data. Lastly I have used various machine learning models to make predictions on the formerly cleaned and preprocessed data. Then I used GridSearchCV to optimise the parameters of the various models

topic-modelling-using-lda-and-lsa-in-sklearn

I have performed topic modelling on the dataset : "A Million News Headlines' on the kaggle. I have first pre-processed and cleaned the data. Then I have used the implementations of the LDA and the LSA in the sklearn library. Also the distribution of words in a topic is shown.

mrc03 Goto Github PK

Raj Mehrotra's Projects

Recommend Projects

Recommend Topics

Recommend Org