6/14/2016 to 8/18/2016
Instructor: Hamed Hasheminia
Tuesdays | Thursdays |
---|---|
6/14: Data Science - Introduction Part I | 6/16 Data Science - Introduction Part II |
6/21: Linear Regression Lines Part I | 6/23: Linear Regression Lines Part II |
6/28: Model Selection | 6/30: Missing Data and Imputation |
7/5: K-Nearest Neighbors | 7/7: Logistic Regression Part I |
7/12: Logistic Regression Part II | 7/14: In Class Project |
7/19: Decision Trees Part I | 7/21: Decision Trees Part II |
7/26: Natural Language Processing | 7/28: Time Series Models |
8/2: Principal Component Analysis | 8/4: Data Visualization |
8/9: Naive Bayes | 8/11: Course Review |
8/16: Final Project Presentations I | 8/18: Final Project Presentations II |
##Lecture 1 Summary (Data Science - Introduction Part I)
- Data Science - meaning
- Continuous, Discrete and Qualitative Data
- Supervised vs Unsupervised Learning
- Classification vs Regression
- Time series vs cross-sectional data
- Numpy
- Pandas
Resources
- Lecture 1 - Introduction - Slides
- Intro Numpy - Code
- Intro Numpy - Code - Solutions
- Intro Pandas - Code
- InClass Practice Code - Pandas
- InClass Practice Code - Solutions
Set up GitHub - Self-study guide
- Lecture 0 - GitHub - Slides
- excellent videos to set-up github. Students who have not used GitHub before must watch these videos.
- A hands-on introduction to Git and GitHub, and how to make them work together! More Git resources for beginners here
Pre-work for second lecture
- Review all lecture notes including Lecture Slides, Numpy notebook, and Pandas notebook
- Finish self-study Github guidlines listed above
- Finish Inclass Practice Code
- Review final project requirements. You can find timelines for final project at slide 11 of Lecture 1 PowerPoint Slides
Additional Resources
- Official Pandas Tutorials. Wes & Company's selection of tutorials and lectures
- Julia Evans Pandas Cookbook. Great resource with examples from weather, bikes and 311 calls
- Learn Pandas Tutorials. A great series of Pandas tutorials from Dave Rojas
- Research Computing Python Data PYNBs. A super awesome set of python notebooks from a meetup-based course exclusively devoted to pandas
- Measures of central tendency (Mean, Median, Mode, Quartiles, Percentiles)
- Measures of Variability (IQR, Standard Deviation, Variance)
- Skewness Coefficient
- Boxplots, Histograms, Scatterplots
- Central Limit Theorem
- Class/Dummy Variables
- Walkthrough describing and visualizing data in Pandas
Resources
- Lecture 2 - Slides
- Basic Statistics - Part 2 - Lab Codes
- Basic Statistics - Part 2 - Practice Code
- Basic Statistics - Part 2 - Practice Code - Solutions
HW 1 is Assigned
- Please read and follow instructions from readme
- This homework is due on June 23rd, 2016 at 6:30PM
Additional Resources
- Here you can find valuable resources for matplotlib
- A good Video on Centeral Limit Theorem
- Linear Regression lines
- Single Variable and Multi-Variable Regression Lines
- Capture non-linearity using Linear Regression lines.
- Interpretting regression coefficients
- Dealing with dummy variables in regression lines
- intro on sklearn and searborn library
Resources
- Lecture 3 - Slides
- Linear Regression - Part I - Lab Codes
- Linear Regression - Part I - Practice Code
- Linear Regression - Part I - Practice Solutions
Additional Resources
- My videos on regression lines. Video 1, Video 2
- This is an excellent book. In Lecture 3 and Lecture 4, we are going to cover Chapter 3 of this textbook.
- Seaborn
- Weighted Least Square Method (WLS)
- Good resource for heteroskedasticity
- Here Contours are elegantly introduced.)
- Hypothesis test - test of significance on regression coefficients
- p-values
- Capture non-linearity using Linear Regression lines.
- R-squared
- Interaction Effects
Resources
- Lecture 4 - Slides
- Linear Regression - Part II - Lab Codes
- Linear Regression - Part II - Practice Code
- Linear Regression - Part II - Practice Solution
Additional Resources
- My videos on regression lines. Video 1, Video 2
- This is an excellent book. In Lecture 3 and Lecture 4, we covered Chapter 3 of this textbook.
- statmodels.formula.api
HW 2 is Assigned