Name: Kevin Vinay Kasundra
Type: User
Company: National Renewable Energy Laboratory
Bio: A Linear regression model trying to become as worthy as the likes of Neural Networks, SVM and Gradient Boosting.
Location: Denver, CO
Blog: https://www.linkedin.com/in/kevin-kasundra/
Kevin Vinay Kasundra's Projects
Notes and Python scripts for A/B or Split Testing
Extreme Rare Event Classification of whether an Insurance claim would be filled or not.
The original dataset contains 1000 entries with 20 categorial/symbolic attributes prepared by Prof. Hofmann. In this dataset, each entry represents a person who takes a credit by a bank. Each person is classified as good or bad credit risks according to the set of attributes. The objective of the problem is to develop a model for correctly identifying the credit risk of a customer for a bank.
Phase 3 of Ninja Data Scientist Career Track
Repository containing portfolio of data science projects completed by me for academic, self learning, and hobby purposes. Presented in the form of iPython Notebooks, and R files.
Phase 2 of Ninja Data Scientist Career Track
: 9 distinct topic cluster were formed using 13136 text reviews of wine. Parsed (tokenization & POS tagging) and Filtered (stop-word removal & stemming) the text reviews to build the term/doc matrix. The matrix was weighted using TF_IDF. Applied latent dirichlet allocation (LDA) and SVD - latent semantic analysis to classify and analyze topic clusters. Further calculated region wise average price and points of wine. Last but not the least calculated contribution region wise contribution to each topic cluster.
Phase 1 of Ninja Data Scientist Career Track
There are 8 different text files of ebooks which are available freely on http://www.gutenberg.org/ . Steps Performed: Importing of text files to python, Text Parsing and transformation operations are performed such as lower case conversion, removal of special characters, contraction words, tokenizing etc., Tagging parts of speech to each term, Stemming terms to get their root word, Stop Word Removal. The project also shows the difference in the outcome when POS Tagging, Stop Word Removal and Stemming operations are not performed.
Standardization and Principal Component Analysis on the Boston Housing Dataset
The original dataset contains 1000 entries with 20 categorial/symbolic attributes prepared by Prof. Hofmann. In this dataset, each entry represents a person who takes a credit by a bank. Each person is classified as good or bad credit risks according to the set of attributes. The objective of the problem is to develop a model for correctly identifying the credit risk of a customer for a bank.
In this project, I have profiled and analyzed the Yelp dataset as part of a certificate course "SQL for Data Science" offered by UC Davis.
Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.