UCLA Extension - Introduction to Data Science
UCLA Extension Introduction to Data Science (COM SCI X450.1) class materials. Use this repo to access supplemental resources, handouts, assignments, slides, and code. Additional content may be added throughout the course.
SPECIAL OFFER! My book publisher has offered to sell the textbook for our class with a special 25% off student discount. Just click HERE and use coupon code "Gutierrez2019"
Course Content
Course materials are categorized in the following folders:
code
handouts
homeworks
quizzes
slides
Supplemental Resources - General
- JOIN! Slack channel: Down-in-the-Trenches-Data-Science - Stay connected after the class ends with current and past students ... open to all data scientists
- Becoming a machine learning company means investing in foundational technologies - Companies successfully adopt machine learning either by building on existing data products and services, or by modernizing existing models and algorithms
- Scientists rise up against statistical significance - Replacing p-values with confidence intervals?
- Op-Ed: The real reason we’re afraid of robots - Cogent Op-Ed relating to the "Killer AI" meme
Supplemental Resources - Data Science
- Data Con LA 2020 - 109 video presentations from the Data Con LA virtual conference
- Data Science Jobs Report 2020 - Useful employment research from 365DataScience
- Trends in Data Science 2019/2020 - Important industry trends from ODSC
- Data Scientist Resume: Template, Examples and Complete Guide - What a successful data science resume looks like
- insideBIGDATA "Ask a Data Scientist" Series - My popular educational series sponsored by Intel
- All my opendatascience.com articles - Many article keeping pace with the field of data science
- How to Get Your Data Science Career Started - Nice Forbes article on how jump start into Data Science
- Google Dataset Search - NEW! Resource for data scientists
- The Importance of SQL in Practicing Data Science - Reinforcing my advice in class!
- What is Data Science 'Impostor Syndrome'? - Avoid the fear of what you don't know
- Becoming a Data Scientist - Important pointers by head of Kaggle Learn, Dan Becker, Ph.D.
- Industrial Research in Applied Statistics- AMS - Nice article about being a data scientist.
- 6 Reasons Why Data Science Projects Fail - A report from down in the trenches.
- The Difference Between Data Scientists and Data Engineers - A guide to becoming a unicorn.
Supplemental Resources - Machine Learning
- 2020 Outlook on AutoML Updates & Latest Recent Advances - Latest authoritative list of AutoML tools and frameworks
- Data Science Meetup (Feb. 26, 2020) Gradient Boosting Machines (GBM): From Zero to Hero - Slides from a great Meetup
- Data Science Meetup (Feb. 26, 2020) Gradient Boosting Machines (GBM): From Zero to Hero - GitHub repos with R and Python code
- 10 Tips for Choosing the Optimal Number of Clusters - Great article that drills down into unsupervised machine learning clustering
- NGBoost: Natural Gradient Boosting for Probabilistic Prediction - HOT new machine learning algorithm using boosting
- Preventing undesirable behavior of intelligent machines - Cool research paper addressing the debate over machine learning bias
- VIDEO presentation from LA West R Meetup group - Better Than Deep Learning Gradient Boosting Machine 2019
- SLIDES from LA West R Meetup group - Better Than Deep Learning Gradient Boosting Machine 2019
- Linear Regression with Healthcare Data for Beginners in R - Nice starter exercise for newbie data scientists
- Book Review: Deep Learning Revolution - Nice deep learning book for a general audience.
- Evaluate your R model with MLmetrics - Using R’s MLmetrics to evaluate machine learning models. MLmetrics provides several functions to calculate common metrics for ML models, including AUC, precision, recall, accuracy, etc.
- Assessment Metrics for Clustering Algorithms - Metrics for clustering and unsupervised machine learning
Supplemental Resources - R Coding
- Book R code - R code for my book "Machine Learning and Data Science: An Introduction to Statistical Learning Methods with R
- R vs Python: Different similarities and similar differences - Nice cross-comparison of the R and Python data science programming languages
- R Tutorials - Good R programming tutorials to read in parallel with weeks 1-4 of class
- Descriptive Statistics in R - Good resource for Week 7 of class on EDA
- Demystifying Regular Expressions in R - Intro to text analytics
- Vignette: data.table - Data.table is an extension of data.frame package in R
- How Tidyverse Guides R Programmers Through Data Science Workflows - A cohesive way of approaching data science projects
- Type conversion and you (or and R) - More examples on type conversion and coercion in R
- Essential list of useful R packages for data scientists - Great list of important R packages for data scientists
- R plot pch symbols : The different point shapes available in R - An examination of all the popular PCH argument values for data vizualizations
- R color names
Supplemental Resources - Python Coding
- Python Programming Tutorials - Many tutorial resources for Python coding for data science and machine learning
- Talk Python - A podcast on Python and related technologies
Supplemental Resources - Mathematics
- Fundamentals of Multivariate Calculus for DataScience and Machine Learning - Great tutorial for learning the math behind ML and DL
- Theoretical Foundations of Data Science— Should I Care or Simply Focus on Hands-on Skills? - YES, you should care!
- Linear Algebra via MIT OpenCourseWare - Learn linear algebra from Gil Strang, the best of the best!
- Calculus — Multivariate Calculus And Machine Learning -- A Must Know Concept For Every Professional - Here is the bare minimum Calculus necessary for machine learning.
Supplemental Resources - Statistics
- Do my data follow a normal distribution? - A note on the most widely used distribution and how to test for normality in R.
- Fisher's exact test in R: independence test for a small sample - Focuses on the Fisher’s exact test. Independence tests are used to determine if there is a significant relationship between two categorical variables.
Supplemental Resources - Books
- Introduction to Statistical Learning - Great book to use following this class.
- Elements of Statistical Learning - The "Machine Learning Bible"
COVID-19 Data Science Grab Bag
- 2020 ASA DataFest at UCLA Winners - Wonder student COVID-19 data projects with video presentations
- 2020 ASA DataFest at UCLA All Presentations
- R Interface to COVID-19 Data Hub - Great way to get productive quickly with COVID data using R
- Demo of reproducible geographic data analysis: mapping Covid-19 data with R
- COVID-19 Data - covdata is a data package for R. It provides COVID-19 case data from three sources
- The COVID Tracking Project - Raw data for tracking COVID-19
- A COVID Small Multiple
- COVID-19 Resource Gallery
- Merge Covid-19 Data with Governmental Interventions Data
- Tidying the new Johns Hopkins Covid-19 time-series datasets
- How to create a simple Coronavirus dashboard specific to your country in R
- Contagiousness of COVID-19 Part I: Improvements of Mathematical Fitting
- C3.ai COVID-19 Data Lake
Excellent Class Project Examples - Past Students
- Analysis of Reptile & Amphibian Observations in Los Angeles County - Work done by student Timothy Stegman (Fall 2020)
- Analysis of Match Statistics and Team Performances in the Premier League From Season 2015/16 to Season 2019/20 - Work done by student Tara Nguyen(Fall 2020)
- What Makes Us Happy - Using Kaggle "Young People Survey" data set - Work done by student Alexander Fichtl, taking course from Germany (Spring 2020)
- Data Analysis Evolution of Popular Music - Work done by student William Toth (Spring 2020)
- Airbnb Price Prediction for different areas in NYC - Work done by student Hashneet Kaur (Winter 2020)
- Predicting-Hotel-booking-demand-and-cancellation - Work done by student Elaine Kuang (Winter 2020)
- Analysis-of-Coronavirus-COVID-19-New-Confirmed-Cases - Work done by student Micky Lee (Winter 2020)
- Data Analysis for 2019 Indian General Election - Work done by student Junhui Yang (Winter 2020)
- FIVB Beach Volleyball Historic Top 8 Teams Analysis - Work done by student Tyler Widdison (Fall 2019)
- Data Analysis for PM2.5 in Beijing - Work done by student Xiaozhu Zhang (Spring 2019)
Authors
- Daniel D. Gutierrez - LinkedIn
License
This project is licensed under the MIT License - see the LICENSE.md file for details