- Prompt | Report
- Creates a Mathematical Model to numerically estimate what makes a success marriage. It incorporates feature selection, data training methods, and possible output errors.
- Prompt | Report | R Code
- Uses Supervised Machine Learning to beat Zillow.com’s “zestimates”
- Developed in R and incorporates data modeling and manipulation techniques such as data removal, munging, and imputation, and linear and forest regressions
Results:
✓ Random Forest Model predicts within $27,000. (The Average Home Price is $315,000)
✗ Model Requires more observations to accurately predict extrapolated data, however performs well within the included zipcodes.
- Syllabus
- Philosophy of modeling and learning using data
- Prediction via the ordinary linear model including orthogonal projections, sum of squares identity, R2 and RMSE
- Polynomial and interaction regressions
- Prediction with machine learning including neural nets (the perceptron), support vector machines and the tree methods CART, bagged trees and Random Forests
- Probability estimation using logistic regression, asymmetric cost classifiers and the ROC / DET performance curves
- Underfitting vs. overfitting and the bias-variance decomposition / tradeoff
- Model validation including out of sample techniques such as cross validation and bootstrap validation
- Correlation vs. causation, causal models, lurking variables and interpretations of linear model coefficients
- Extrapolation
- The R language will be taught formally from the ground and up as well as visualization using the ggplot library and manipulation using the dplyr and data.table libraries.
- Basic Probability Theory: axioms, conditional probability, in/dependence
- Modeling with discrete random variables: Bernoulli, Hypergeometric, Binomial, Poisson, Geometric, Negative Binomial, Uniform Discrete and others
- Expectation and variance
- Modeling with continuous random variables: Exponential, Uniform and Normal
- Frequentist confidence intervals and hypothesis testing for one-sample proportions
- Basic visualization of data: plots, histograms, bar charts
- Linear algebra: Vectors, matrices, rank, transpose
- Programming: basic data types, vectors, arrays, control flow (for, while, if, else), functions