Getting and Cleaning Data

The raw data files

The main folder UCI HAR Dataset contains:

a train data folder of

X_train.txt: Train feature data set, consisting of 561 measurements/features from accelerometer and gyroscope
y_train.txt: Train activity data set (identified by activity_id)
subject_train.txt: Subject train data set (identified by subject_id)

a test data folder of

X_test.txt: Test feature data set, consisting of 561 measurements/features from accelerometer and gyroscope
y_test.txt: Test activity data set (identified by activity_id)
subject_test.txt: Subject test data set (identified by subject_id)

activity_labels.txt: Data set with the Activity_id and Activity_Label relationship (WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, LAYING)
features.txt: All of the features obtained, namely a multiplication of

Name Time Freq.

Body Linear Acceleration 1 1

Gravity Linear Acceleration 1 0

Body Linear Jerk 1 1

Body Angular Velocity 1 1

Body Angular Acceleration 1 0

Body Linear Acceleration Magnitude 1 1

Gravity Linear Acceleration Magnitude 1 0

Body Linear Jerk Magnitude 1 1

Body Angular Velocity Magnitude 1 1

Body Angular Acceleration Magnitude 1 1

with
Function Description

mean Mean value

std Standard deviation

mad Median absolute value

max Largest values in array

min Smallest value in array

sma Signal magnitude area

energy Average sum of the squares

iqr Interquartile range

entropy Signal Entropy

arCoeff Autorregresion coefficients

correlation Correlation coefficient

maxFreqInd Largest frequency component

meanFreq Frequency signal weighted average

skewness Frequency signal Skewness

kurtosis Frequency signal Kurtosis

energyBand Energy of a frequency interval

angle Angle between two vectors

Description of function from: https://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2013-84.pdf

features_info.txt: A description of the base feature list

README.txt: a general readme file

Procedure for tidying up data

Merge the training and the test sets to create one data set.

Extract only the measurements on the mean and standard deviation for each measurement.

Use descriptive activity names to name the activities in the data set.

Appropriately label the data set with descriptive variable names.

From the data set in step 4, create a second, independent tidy data set with the average of each variable for each activity and each subject.

Files included for Project

README.md: General readme file for project.

CodeBook.md: Code Book describing the variables, the data, and the transformations performed to clean up the data.

run_analysis.R: R script which does the procedure described earlier to tidy up data.

tidy_data.txt: Tidy data set file, output as a txt file.

tidy_data.xls: Tidy data set file, output as xls file for those who see easier data in xls files :)

Output files not uploaded into the repository.

Running the script

Download the script to the home directory ("~/")

Execute the following commands (required libraries and the zipped data file are automatically used and if not present, are downloaded and extracted/installed)

Curl must be properly set-up in file system when using the script to also fetch zipped data file into working directory, otherwise download and extract the zipped file externally into working directory ("~/").
source("run_analysis.R")
run_analysis()
Viewing the text file in R

To view the text file in a readable way, issue
tidydata <- read.table("tidy_data.txt", header = TRUE) #tidy_data.txt must be in current working directory!
View(tidydata)
For further information

Read CodeBook.md for a description of the transformations used as well as the variables and data.

lmsv-mx123 / getdata_courseproject Goto Github PK

getdata_courseproject's Introduction

Getting and Cleaning Data

The raw data files

Procedure for tidying up data

Files included for Project

Running the script

Viewing the text file in R

For further information

getdata_courseproject's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Name	Time	Freq.
Body Linear Acceleration	1	1
Gravity Linear Acceleration	1	0
Body Linear Jerk	1	1
Body Angular Velocity	1	1
Body Angular Acceleration	1	0
Body Linear Acceleration Magnitude	1	1
Gravity Linear Acceleration Magnitude	1	0
Body Linear Jerk Magnitude	1	1
Body Angular Velocity Magnitude	1	1
Body Angular Acceleration Magnitude	1	1

Function	Description
mean	Mean value
std	Standard deviation
mad	Median absolute value
max	Largest values in array
min	Smallest value in array
sma	Signal magnitude area
energy	Average sum of the squares
iqr	Interquartile range
entropy	Signal Entropy
arCoeff	Autorregresion coefficients
correlation	Correlation coefficient
maxFreqInd	Largest frequency component
meanFreq	Frequency signal weighted average
skewness	Frequency signal Skewness
kurtosis	Frequency signal Kurtosis
energyBand	Energy of a frequency interval
angle	Angle between two vectors