Code Monkey home page Code Monkey logo

getdata_courseproject's Introduction

Getting and Cleaning Data

The raw data files

The main folder UCI HAR Dataset contains:

  1. a train data folder of
  • X_train.txt: Train feature data set, consisting of 561 measurements/features from accelerometer and gyroscope
  • y_train.txt: Train activity data set (identified by activity_id)
  • subject_train.txt: Subject train data set (identified by subject_id)
  1. a test data folder of
  • X_test.txt: Test feature data set, consisting of 561 measurements/features from accelerometer and gyroscope
  • y_test.txt: Test activity data set (identified by activity_id)
  • subject_test.txt: Subject test data set (identified by subject_id)
  1. activity_labels.txt: Data set with the Activity_id and Activity_Label relationship (WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, LAYING)

  2. features.txt: All of the features obtained, namely a multiplication of

Name Time Freq.
Body Linear Acceleration 1 1
Gravity Linear Acceleration 1 0
Body Linear Jerk 1 1
Body Angular Velocity 1 1
Body Angular Acceleration 1 0
Body Linear Acceleration Magnitude 1 1
Gravity Linear Acceleration Magnitude 1 0
Body Linear Jerk Magnitude 1 1
Body Angular Velocity Magnitude 1 1
Body Angular Acceleration Magnitude 1 1

with

Function Description
mean Mean value
std Standard deviation
mad Median absolute value
max Largest values in array
min Smallest value in array
sma Signal magnitude area
energy Average sum of the squares
iqr Interquartile range
entropy Signal Entropy
arCoeff Autorregresion coefficients
correlation Correlation coefficient
maxFreqInd Largest frequency component
meanFreq Frequency signal weighted average
skewness Frequency signal Skewness
kurtosis Frequency signal Kurtosis
energyBand Energy of a frequency interval
angle Angle between two vectors

Description of function from: https://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2013-84.pdf

  1. features_info.txt: A description of the base feature list

  2. README.txt: a general readme file

Procedure for tidying up data

  1. Merge the training and the test sets to create one data set.
  2. Extract only the measurements on the mean and standard deviation for each measurement.
  3. Use descriptive activity names to name the activities in the data set.
  4. Appropriately label the data set with descriptive variable names.
  5. From the data set in step 4, create a second, independent tidy data set with the average of each variable for each activity and each subject.

Files included for Project

  1. README.md: General readme file for project.
  2. CodeBook.md: Code Book describing the variables, the data, and the transformations performed to clean up the data.
  3. run_analysis.R: R script which does the procedure described earlier to tidy up data.
  4. tidy_data.txt: Tidy data set file, output as a txt file.
  5. tidy_data.xls: Tidy data set file, output as xls file for those who see easier data in xls files :)
  • Output files not uploaded into the repository.

Running the script

  • Download the script to the home directory ("~/")

  • Execute the following commands (required libraries and the zipped data file are automatically used and if not present, are downloaded and extracted/installed)

    • Curl must be properly set-up in file system when using the script to also fetch zipped data file into working directory, otherwise download and extract the zipped file externally into working directory ("~/").
source("run_analysis.R")
run_analysis()

Viewing the text file in R

  • To view the text file in a readable way, issue
tidydata <- read.table("tidy_data.txt", header = TRUE) #tidy_data.txt must be in current working directory!
View(tidydata)

For further information

Read CodeBook.md for a description of the transformations used as well as the variables and data.

getdata_courseproject's People

Contributors

lmsv-mx123 avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.