Code Monkey home page Code Monkey logo

data-wrangling's Introduction

Data-Wrangling

Foundations of Data Science data wrangling project using Samsung Galaxy S Smartphone dataset for human activity recognition

###Load dplyr & tidyr [library(dplyr) library(tidyr)]

###Loading and exploring the structure of the dataset.
Doing this I came up with 8 different tables/data frames that would need to be merged, they are: x_train, y_train, x_test, y_test, subject_test, subject_train, feature_labels, and activity_labels.

###Merge training and test data (on X,subject & y respectively) At this point, the task was to merge the training data with the test data, which is what I did to create 3 different data frames called x_data, subject_total, and y_data, each from x_test & x_train, subject_test & subject_train, and y_test & y_train, respectively. At this point, I ignored the labels for the data.

[x_data <- bind_rows(x_train, x_test) subject_total <- bind_rows(subject_train, subject_test) y_data <- bind_rows(y_train, y_test)]

###Apply labels to data.
At this point, I am ready to merge the feature_labels & activity_labels to the data. Feature labels were applied for the column names and activity labels were applied to describe the numbers 1-6 as walking, walking upstairs, walking downstairs, sitting, standing and laying. I had to apply labels to the activity_labels data set so I could later join it with y_data.
[names(subject_total) <- c('Subject') names(x_data) <- make.names(feature_labels$V2, unique=TRUE) names(activity_labels) <- c('ActivityLabel', 'ActivityName') names(y_data) <- c('ActivityLabel')]

###Apply activity names to y_data to use to label all observations with the corresponding activity labels and names respectively This step joins the actvity_label and y_data set to apply the activity name to each observation, instead of using just the activity_label or code for each activity. This makes it easier to understand what is happening with the dataset. I called this data frame y_labeled. [y_labeled<- y_data %>% left_join(activity_labels)]

###Merge x,y, and subject data In this step, I merged the x_data, the y_labeled, and subject_total. The x_data are the observations, the y_labeled are essentially the row names and the subject_total also are row names describing which subject is performing which activity for each observation. I called the resulting data frame one_data_set. [one_data_set <-bind_cols(subject_total, y_labeled, x_data)]

###Extract columns containing mean and standard deviation in the name. I selected everything from one_data_set except for those columns which had "mean" or std" in the name. [select_data <- select(one_data_set, -matches("mean|std"))]

###Create a new summary data set from select_data In this step I created a tidy data set from select_data which calculated the mean for each column, grouped by subject and activity name. The resulting data frame, called summary_data, included 180 observations (to account for 30 subjects * 6 activities) of 478 variables.
[summary_data <- select_data %>% group_by(Subject, ActivityName) %>% summarise_each(funs(mean))]

####The summary_data can be viewed here: https://drive.google.com/file/d/0B8dPWvrTMSCSMlJyVlBLY1lRaU0/view?usp=sharing

data-wrangling's People

Contributors

ninarab avatar

Watchers

 avatar

Forkers

fukac99

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.