Code Monkey home page Code Monkey logo

sebastiangj / learnanalytics-analyzingbigdatawithmrs Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ketsha/learnanalytics-analyzingbigdatawithmrs

0.0 1.0 0.0 11.74 MB

The information in this Github Repository is presented by Microsoft on Analyzing Big Data with Microsoft R Server. These materials are intended to assist you in presenting or learning the materials, and are current as of 05/03/2017. Updates to the platforms or products discussed in this Github Repository do not reflect changes made after the publication date, and you should review the materials to ensure they are still accurate. Microsoft makes no warranty or guarantees of any kind of accuracy of the materials beyond the date posted.

Home Page: https://azure.github.io/LearnAnalytics-AnalyzingBigDataWithMRS/

License: Creative Commons Attribution 4.0 International

SQLPL 3.73% R 96.27%

learnanalytics-analyzingbigdatawithmrs's Introduction

Introduction

Analyzing Big Data with Microsoft R is designed to help R users learn to process, query, transform and summarize, and build models on large datasets using Microsoft R Server's RevoScaleR package. This course takes a use-case-based approach by walking through a knowledge discovery and data mining example using MRS.

You will find all the rendered course content in student_resources. If you wish to run the code you can open the Rmd files in instructor_resources in your R IDE.

Pre-requisites

Ideally, this course is for intermediate or advanced R users who have a solid grounding in R basics (especially data types, writing functions, and using the apply family of functions) and experience in data analysis with R using third-party packages such as dplyr and ggplot2. Moreover, this course was written for users who come from a business analyst background, such as R, SAS, SPSS or other business analysts who are familiar with computer science and programming concepts, but are not necessarily experts in computer programming or distributed computing, and still want to learn how to use R for running analyses on big datasets and in the future be able to deploy their analytics workflow in a production environment such as Hadoop, Spark or SQL Server. Additionally, the course assumes some familiarity with a basic modeling workflow, i.e. ingesting data, preparing data for analysis, building and comparing models, choosing a good fit, and scoring new data.

Learning objective

After completing this course, participants will be able to use R and Microsoft R Server's RevoScaleR library in order to:

  1. Read and process flat files (CSV) efficiently
  2. Clean and prepare data for analysis
  3. Write complex transformations to add new features to the data
  4. Visualize, explore, and summarize data
  5. Build analytical models on large datasets and compare them
  6. Score new data with a model

Throughout this course, we provide enough code examples using RevoScaleR that the intermediate to advanced R user would learn how to integrate RevoScaleR into their R workflow and use it to build scalable solution to problems involving large datasets.

In addition to learning some specifics about coding in RevoScaleR, this course also heavily emphasizes some common themes in doing data science in general. Here are some examples of questions that we ask and explore throughout the course:

  • How should a data scientist think about data and metadata?
  • What is the big deal about big data?
  • How do we ask questions about the data and how do we obtain answers?
  • How do we take lots of results and summarize or visualize them in a way that make larger trends stand out?
  • What is the data science process life cycle and where do we go from here?
  • How do we build basic models and compare or evaluate them?

As we go through the course, we encourage everyone to keep these overarching themes in mind to develop better intuition as a data scientist.

Please let us know how we can improve our content.

Created by a Microsoft Employee.

learnanalytics-analyzingbigdatawithmrs's People

Contributors

akzaidi avatar jreynolds01 avatar microsoftopensource avatar sethmott avatar sushmavegunta avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.