Code Monkey home page Code Monkey logo

dataquest's Introduction

Data Science Massive Open Online Course

Below is a list of topics covered in meticulous detail.

In an effort to transition my career from that of a Chemist & Data Automation Specialist to a Data Scientist, I followed the self-guided curriculum laid out by DataQuest.io. I elected to skip the first course as I was already very strong in the basics of python. When I began, I already had 2+ years of Python/SQLite data extraction, cleansing, and analysis experience. The code for each DataQuest.io course is included in this repo.

Unfortunately, the sheer volume and size of the raw csv files used, vastly exceeds the file size limitation imposed by github. Therefore, I’ve have included *.csv files on the .gitignore file.

2. Data Analysis, Visualization, & Cleaning

  • Numpy
  • Pandas
  • Jupyter Notebook
  • matplotlib
  • seaborn
  • basemap
  • regular expressions (re)

3. Linux Command Line

  • Running a Linux VM
  • Nagivation
  • Working with files
  • Running python scripts from the command line
  • Pipng & redirecting output
  • csv toolkit
  • git
  • git remotes (github, .gitignore)

4. Working with Data Sources

  • API's (requests)
  • JSON (JavaScript Object Notation)
  • Authentication (OAuth2)
  • Web Scraping (BeautifulSoup)
  • SQL (Joins, WITH VIEW, UNION, INTERSECT, EXCEPT, ect...)
  • SQLite (sqlite3)
  • Database normalization
  • PostgreSQL (psycopg2, PostgreSQL Command-line, .pgpass)
  • Database indexing

5. Probability & Statistics

  • Standard Deviation & Correlation
  • Linear Regression
  • Disributions & Sampling
  • Probabilities
  • Probability Distributions
  • Chi-Squared Tests
  • Multi Category Chi-Squared Tests
  • Major Python Libraries Learned/Utilized in the Probability & Statistics Course:
    • scipy.stats
      • skew
      • kurtosis
      • norm
      • pearsonr
      • linregress
      • binom
      • chisquare
      • chi2_contingency
      • linspace (note, this one is in scipy, not scipy.stats)
    • math
    • functools
    • operator

6. Maching Learning

  1. Fundamentals
    1. Introduction to K-Nearest Neighbors
    2. Evaluating Model Performance
    3. Multivariate K-Nearest Neighbors
    4. Hyperparameter Optimization
    5. Cross Validation
  2. Calculus For Machine Learning
    1. Understanding Linear & Nonlinear Functions
    2. Understanding Limits
    3. Finding Extreme Points
  3. Linear Algebra For Machine Learning
    1. Linear Systems
    2. Vectors
    3. Matrix Algebra
    4. Solution Sets
  4. Linear Regression For Machine Learning
    1. The Linear Regression Model
    2. Feature Selection
    3. Gradient Descent
    4. Ordinary Least Squares
    5. Processing & Transforming Features
  5. Machine Learning in Python Intermediate Course
    1. Logistic Regression
    2. Binary Classifiers
    3. Multiclass Classification
    4. Intermediate Linear Regression
    5. Overfitting
    6. Clustering Basics
    7. K-Means Clustering
    8. Gradient Descent
    9. Into to Neural Networks
  6. Decision Trees
    1. Entropy
    2. Information gain
    3. ID3 algorithm
    4. apply & tweak decision trees
    5. random forests
  7. Machine Learning Final Project
    1. Data Cleaning
    2. Preparing the features
    3. Making Predictions
  8. Major Python Libraries Learned in the Machine Learning Course
    1. scipy.spatial
      • distance
    2. sklearn.neighbors
      • KNeighborsRegressor
    3. sklearn.cluster
      • KMeans
    4. sklearn.linear_model
      • LinearRegression
      • LogisticRegression
    5. sklearn.metrics
      • mean_squared_error
    6. sklearn.metrics.pairwise
      • euclidean_distances
    7. sklearn.model_selection
      • cross_val_score
      • KFold
    8. SumPy
      • symbols
      • limit
    9. NumPy
      • linalg.inv
      • linalg.det
      • dot

dataquest's People

Contributors

kitestring avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.