This repo is focused exclusively on my adventure learning data science while enrolled in the Thinkful.com data science program, and the tools and techniques necessary to perform data-science-related tasks. This includes, but is not limited to: python, SQLite, pandas, numPy, sciPy, dato's graphlab create and sframe, time series analysis, statistical analysis and plots, regression and classifation, random forests, decision trees, k nearest neighbors, etc.
There are other data sciency things in the root folder of my github, such as my <a href="https://github.com/yorktronic/hots-comp-calc" target-"_blank">Heroes of the Storm team comp calculator, using optimization techniques to plan food quantities for my wedding, and my work for Coursera's machine learning specialization.
My data science blog can be found here.
-
Predicting body position of smartphone users based on accelerometer data Decision trees, random forest, black box analysis, dato, graphlab create.
-
Predict class of flower based on sepal measurements k nearest neighbors, graphlab create, pandas
-
Weather analysis of major US cities. API calls, pandas, requests, sqlite, histograms, qq plots.
-
Determined factors correlated with interest rate offerings from Lending Club. Linear regression, pandas, matplotlib.
-
Cross validation of Lending Club linear regression pandas, statsmodels, scikit-learn, KFold.
-
How New Yorkers bike using the CitiBike public bike program. Time series data, pandas, matplotlib, sqlite.
-
Document retrieval using Wikipedia data. Text analysis, vectors, nearest neighbors.
-
Sentiment analysis of Amazon.com product reviews. Natural language processing, classification.