Code Monkey home page Code Monkey logo

stat418-tools-in-datascience's Introduction

Stats 418: Tools in Data Science

Stats 418 is a graduate level statistics course restricted to UCLA Masters in Applied Statistics students. The course will present current tools for data acquisition, transformation and analysis, data visualization, and machine learning and tools for reproducible data analysis, collaboration, and model deployment used by data scientists in practice. Advanced R packages and Python libraries, analytical databases, high-performance machine learning libraries, big data tools.

Course Description

Data Science has been vaguely defined and re-defined for the good part of the past decade, but has long history in both statistics and computing. What is clear is that the necessary 'tools' for a Data Scientist are changing at an incredbily rapid rate, deeming an exclusive course focusing on these tool a necessity. The prerequisites of this course, Data Management and Statistical Computing, in addition to the other courses in the MAS program, provide a solid base for a large portion Data Science daily work with this course buildling on that foundation. The course will present a large breadth of topics all under the overarching goal of building a data product(something akin to taking a data science project from conception to completion through deployment). These topics will all mirror things that are being used currently in industry by the top Data Scientists from unicorn startups to fortune 500 companies. Much of it will be applied in a learning-by-doing fashion while being presented with the resources to dive in when a topic necessitates it.

Key Dates

Problem set due dates will be announced as each problem set is distributed.

Other important deadlines and dates during the term are:

  • 5/7/19 data/proposal slide submissions (push to github)
  • Student data/proposal presentations (Week 6)
  • 6/4/19 Final slide submissions
  • Student final presentations (Week 10)

Resources

Papers will be posted in the corresponding weekly directory. There is no textbook.

Evaluation

Problem Sets

Working on actual problems is central to learning. Four problem sets will be assigned, on alternating weeks. These assignments will consist of analytical problems, computer simulations, and data analysis. Late submissions will not be accepted. Assignment will generally be made available by Tuesdays and due two Tuesdays later prior to lecture. All sufficiently attempted homework (ie. a typed and well organized write-up with all problems attempted) will be graded on a (+,โœ“,-) scale. Students are encouraged to discuss the problems together, but must independently produce and submit solutions. Work should be done as a RMarkdown file or a Jupyter notebook and committed to GitHub.

Final Project

A final project will be completed as individuals. The project will encourage collaboration, test your data acquisition skills, use your predictive modeling, challenge your programming ability and promote presenting skills. A proposal presentation with an acquired dataset, exploratory data analysis, and future direction with be presented during week 6. In addition, each individual will also present their work to the class during the final week of the quarter. Effective verbal communication is a critical skill for data scientist, and it requires practice and feedback to develop. Additional information about the final project will given as the course progresses, including the grading rubric.

Course Topics

Week 1 [4/2]: Introduction to course and each other. Overview of Data Science tools. Introduction and installation of Docker.

Week 2 [4/9]: Data Science in the command line. Learning about Unix. Reproducible research/work through git/Github and Docker.

Week 3 [4/16]: More Data Science in the command line. Analytical databases, SQL, NoSQL databases, MongoDB. Accessing databases through R(dbplyr) and Python(SQL Alchemy)

Week 4 [4/23]: Acquiring data through APIs and web-scraping. Rvest and Beautiful Soup in Python.

Week 5 [4/30]: Tools for data visualization: ggplot2, shiny (interactive web applications with R) / shiny dashboards and plotly.

Week 6 [5/7]: Final Project proposal presentations. Machine Learning libraries in both R and Python. Introduction to deep learning and AutoML.

Week 7 [5/14]: Continuation of Machine Learning libraries with use of cloud services. Introduction to NLP libraries.

Week 8 [5/21]: Building APIs for model deployment. Exploration of Plumber (R) and Flask or Falcon (Python)

Week 9 [5/28]: Continuation of API construction for model deployment. Buildling a Slackbot.

Week 10 [6/4]: Student Final Presentations.

stat418-tools-in-datascience's People

Contributors

natelangholz avatar badboyfearness avatar chenjoyq avatar kaleberickson avatar tannerkoscinski avatar langholzucla avatar asy02006 avatar rtranguy avatar awilks-rand avatar chenx872 avatar guydotan avatar janellashu avatar ripsilon avatar

Stargazers

Magnus Jurdal avatar Jordan Mendler avatar Josh Muncke avatar Tem Gareys avatar

Watchers

Jordan Mendler avatar James Cloos avatar Magnus Jurdal avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.