mrc03 Goto Github PK
Name: Raj Mehrotra
Type: User
Bio: Data Scientist | Kaggle Master | Published Author | Topmate: https://topmate.io/raj_mehrotra
Location: Hyderabad, Telangana
Name: Raj Mehrotra
Type: User
Bio: Data Scientist | Kaggle Master | Published Author | Topmate: https://topmate.io/raj_mehrotra
Location: Hyderabad, Telangana
The famous Housing Price Advanced Regression competition on Kaggle. The dataset contains of training and testing sets each with about 1.46K rows and 81 features pertaining to a house. I have first performed an exhaustive EDA to identify the underlying trends in the data. I have also removed outliers to make the regression models more robust. Also proper missing values treatment has been done with imputation being done wherever needed. Lastly I have deployed various regression models like Lasso,Ridge etc... from scikit and have also tuned their parameters from the GridSearchCV module. Finally achieved a RMSE of little more than 0.12 which is pretty decent.
The IBM HR Analytics Employee Attrition & Performance dataset from the Kaggle. I have first performed Exploratory Data Analysis on the data using various libraries like pandas,seaborn,matplotlib etc.. Then I have plotted used feature selection techniques like RFE to select the features. The data is then oversampled using the SMOTE technique in order to deal with the imbalanced classes. Also the data is then scaled for better performance. Lastly I have trained many ML models from the scikit-learn library for predictive modelling and compared the performance using Precision, Recall and other metrics.
Codes for Kaggle Competitions
This is a repository containing the list of company wise questions available on leetcode premium
machine learning and deep learning tutorials, articles and other resources
MNE : Magnetoencephalography (MEG) and Electroencephalography (EEG) in Python
The MNIST DIGIT RECOGNIZER COMPETITION ON KAGGLE. The training dataset consists of 42000 rows each of 784 pixel values thus representing 28 x 28 sized 42000 images of different digits from 0 to 9 . I have trained Convolutional Neural Networks written in Keras to train the model and predicted on the 28000 images of the test dataset, Also achieved 99.43% accuracy on Kaggle with 20 epochs . Also used ImageDataGenerator to augment the training set and avoid overfitting problem .
The Movie Reviews dataset. The dataset is imported from the NLTK libray. It has 1000 positive and 1000 negative reviews. I have first imported the dataset into a pandas data frame which makes it easier to do the processing. The next step is to analyze the (+) and ( - ) reviews. I have also preprocessed the dataset using Lemmatizing and other standard NLP techniques. To extract the features from the text I have used the Tfidf vectorizer from the scikit. Lastly I have used various modellig algos from scikit to train on this data.
Digital signal processing for neural time series.
The famous CIFAR-10 dataset. The dataset contains of images of different objects like airplane, horse ,ship etc... that needs to be classified. The training set contains of 50000 images of 32*32 pixels each. Similarly the validation set contains 10000 images of 32*32 pixels too. I have used a self laid ConvNet to correctly classify the images into 10 classes each pertaining to one object. I have also used data augmentation using the ImageGenerator class provided in the Keras library to further increase the size of the training set and thus reduce overfitting chances. Finally I have used the ConvNet to make predictions onto the validation set and achieved a decent accuracy of near about 86%.
Pokemon with stats.Data analysis and exploration is performed on the dataset. Visualization is done using the libraries seaborn,matplotlib. Bar plot,box plot,swarm plot,scatter plot,violin plot, heat map etc... were used to analyze the data.
The Project is an Android application that displays the level of various gases in the atmosphere. The volume of gases in the atmosphere is stored in an Excel file. The data values stored in an Excel file is updated periodcally with data fetched from the sensors.The application reads the contents of the file and displays the results fetched in the application.
A list of papers on Generative Adversarial (Neural) Networks
The Red Wine Quality dataset from kaggle. Data is provided of the composition of the wine having different chemicals. I have used pandas to manipulate the data and seaborn to visualize the data. Finally I have made predictions on the wine quality by using various models from the scikit-learn.
A blood bank mobile application where the user can register and login. A blood donor can register with the application and earn points. The receiver can search for donors and either call donor or locate him on the Google Maps. The application uses Java , XML and the Firebase API as backend and Google Maps API to locate the donor on the Google Maps.
A series of tutorials teaching the use of Python for epileptic seizure detection on open-source datasets
The notebook on famous Kaggle competition : Spooky Author Identification. The task is to identify the authors from their respective texts or work. I have first cleaned and pre-processed the text using standard NLP techniques like tokenization , stemming or lemmatization , stop-word removal etc.... I have also tried to create some meta features or hand-crafted features based on the author writing pattern. Then I have used the traditional BOW approach with TFIDF Vectorizer and the Count Vectorizer and then deployed ML algos like LogisticRegression and Naive Bayes which are well suited for text data. For me tfidf on count vectorizer gave best results till now ; My submission scored a multi-class log loss of 0.46 on kaggle private LB which is quite decent.
The famous Iris Species Dataset from Kaggle. I have normalized the features and also seen their distribution. I have also deployed many algos from scikit to predict on the dataset.
Tic Tac Toe is simple tic tac toe game developed on the android platform. The application was developed in just 2 hours for the International Organisation of Software Developers(IOSD) Hackathon.
The Titanic: Machine Learning from Disaster competiton. With data being provided of varoius passengers traveling on the ship I have used libraries like numpy,pandas to manipulate , explore and analyze the data and libraries like matplotlib and seaborn to visualise the data. Lastly I have used various machine learning models to make predictions on the formerly cleaned and preprocessed data. Then I used GridSearchCV to optimise the parameters of the various models
I have performed topic modelling on the dataset : "A Million News Headlines' on the kaggle. I have first pre-processed and cleaned the data. Then I have used the implementations of the LDA and the LSA in the sklearn library. Also the distribution of words in a topic is shown.
:rocket: Ultimate Android Reference - Your Road to Become a Better Android Developer
A simple implementation of word embeddings in Gensim and Keras libraries. I have implemented famous Word2Vec in Gensim library. As an alternative I have also used Keras embedding layer to generate the word embeddings.
Translating English sentences to Marathi using Neural Machine Translation
word2vec uisng keras inside gensim
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.