raksh710 Goto Github PK

followers: 12.0 following: 10.0 repos: 35.0 gists: 0.0

Name: RAKSHIT SINHA

Type: User

Company: Robert H Smith School of Business - University of Maryland, College Park

Bio: Graduate MS Information Systems Student at RH Smith School of Business (University of Maryland, College Park). An aspiring Data Scientist.

Location: College Park, MD, USA

Blog: https://rakshitsinha.net/

👋 Hi, I’m @Raksh710
👀 I’m interested in Data Science, ML & AI
🌱 I’m currently learning and working on various deep-learning architectiures.
💞️ I’m looking to collaborate on various Kaggle Data Science Projects and competitions
📫 How to reach me -Ping me on Kaggle by sending me collab request Raksh710 is my username
My Portfolio Website: https://rakshitsinha.net/
My Kaggle Profile: https://www.kaggle.com/raksh710
My LinkedIn Profile: https://www.linkedin.com/in/rakshit-sinha-a65325132/

RAKSHIT SINHA's Projects

ajax-movie-recommendation-system-with-sentiment-analysis

Content-Based Recommender System recommends movies similar to the movie user likes and analyses the sentiments on the reviews given by the user for that movie.

anime_recommender_system

Recommends Anime using Content based filtering (using TFIDF vectorization and sigmoid kernel) and collaborative filtering (using KNN)

bike_sharing_data

Implementing various ML Regression model on bike sharing data shared by Capital Bikeshare (Washington D.C.)

building-efficient-portfolio-using-various-trade-strategies

Building an efficient Active Portfolio which yields a high Sharpe Ratio on 8 instruments using various trade strategies in order to get a high Sharpe Ratio.

Using CNN to detect and classify which chest x-ray images have pneumonia and which ones are normal. The data is taken from Kaggle platform. : https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia

comment-processing-tool

Comment Processing-Tool

covid-19_tweets_sentiment_analysis

Predicted the sentiment associated with tweets made on the topic of Covid-19 pandemic. Tweets were classified into "Positive", "Extremely Positive", "Neutral","Negative" and "Extremely Negative". TF-IDF Vectorization was used to vectorize the tokens present in the tweets and then to classify "CatBoost" algorithm was used. Ultimately achieving an accuracy of around 57%.

data_scientist_salaries

Predicting the Salary of data science jobs (for example Data Scientist, Data Engineer, Machine Learning Engineer, Data Analyst, BI Engineer etc.) in USD based on various factors like Work Year (the year in which you are looking for job), Pay grade, Average pay scale in the Country (where the job is located), experience level, Employment type etc.

feature-engineering-live-sessions

flower_detection_using_cnn

Flower detection using CNN

healthcare_analytics

The task is to correctly predict the number of days a patient would be staying in a hospital, out of 10 different categories. 16 different parameters were given. EDA, Feature Engineering, resampling has been performed to properly do data preprocessing. Ultimately CatBoost Classification model has been implemented to achieve more than 41% accuracy.

heart_attack_analysis_and_prediction

Performed an analysis on a dataset and predicting which patients are more likely to suffer from a heart attack. link: https://www.kaggle.com/raksh710/87-accuracy-85-f1-score-knn-14-lr-svc-rf-cbc The dataset is available on kaggle and so is my notebook on this

house_price_advanced_regression

ice_breaker

ice_breaker project forked from emarco177 to test Langchain's capabilities with various APIs

king_county_house_price_regression

Did a comparison between CatBoostRegressor and Keras to find out which model performed best on king county house price regression dataset from kaggle. Link to the notebook: https://www.kaggle.com/raksh710/catboost-vs-keras-cb-wins

landscape_classification

Given an input image, classify the image in the following category: 'buildings': 0, 'forest': 1, 'glacier': 2, 'mountain': 3, 'sea': 4, 'street': 5 <br> </br> Above are the keys along with their tag (or value) are mentioned. A CNN model has been used with 3 Conv2D, 3 MaxPool2d, 1 Flatten, one dropout and 2 Dense layers. <br> </br> After training the CNN model on 14034 images belonging to 6 classes, the CNN model was validated on a validation set with 3000 images belonging to 6 classes, on which an accuracy of 84.17% was achieved. Steps: 1) Specify train, validation and test directory (where images are stored) 2) Use Image Generator to create more samples out of the given number of training samples (in order to detect the class more accurately). Images went through various processes like: zoomed in/out, sheared, rorated etc. 3) Images from train and validation were subjected to the Image Generator created in step: 2. Note that in training the shuffle was True and that in validation it was False, because we want to keep the validation set in order to evalue the accuracy (which required the images to be in order) 4) Image samples from train directory were fed to the CNN model and evaluated on the validation directory. 5) Image samples from test directory were also predicted and evaluated manually.

loan_default_prediction

A major chunk of bank revenue is generated by credit cards. Customers who fail to pay their credit card dues on time could potentially cost banks a lot of revenue. Issuing credit cards to customers who have a higher likelihood of not paying their dues on time involves a higher risk for the bank. Issuing these customers' cards with a higher interest rate would work in favor of the bank. Inorder to make a informed decision about which customer is high risk and which one is low risk, the firm would benefit from a predition model which would accurately predict if the customer would default or not. Prediction can be done based on factors like job, education, balance, loans, and house ownership. Finding out which are the most common factors that defaulters have will also help the bank to be cautious before issuing a credit card to customers who fall into one of those categories.

malicious_website_recognition

Classifying Malicious website from benign ones using CatBoost Classifier. Process involves Exploration of data, Data Cleaning, Resampling of data (to handle highly imbalanced data), Model implementation and Evaluation.

malware_attack_classification

We are working on UMD's info challenge and our dataset is ISCXIDS2012 cybersecurity dataset.

medical_personal_cost

Task was to forecast the medical cost associated with each patient given their medical parameters and health history. CatBoost algorithm was implemented on the data after scaling (Standardization) was done.

mnist

The input data contained image data ( grayscale(color_scale = 1) data of width=28, height=28) of digits from 0 to 9 which are to be identified by the model. I implemented CNN which consisted of convolutional layers as well as MaxPool layers. I achieved 99.6 % accuracy on the test set. Link to my notebook: https://www.kaggle.com/raksh710/mnist-using-cnn-99-6-test-accuracy

raksh710 Goto Github PK

RAKSHIT SINHA's Projects

Recommend Projects

Recommend Topics

Recommend Org