Machine Learning project submission by MK

Goal

The purpose of this project is to prepare a machine learning algorithm that will be able to predict what kind of exercise is done based on data gathered from accelerometers on the belt, forearm, arm, and dumbell. The data comes from the Human Activity Recognition study: Ugulino, W.; Cardador, D.; Vega, K.; Velloso, E.; Milidiu, R.; Fuks, H. "Wearable Computing: Accelerometers' Data Classification of Body Postures and Movements". More information along with the source can be found here: http://groupware.les.inf.puc-rio.br/har

Model prepared

-Model was calculated in R using mainly Caret package. -The data was divided into 60% for the training partition, and 40% for the testing one The following steps were taken:

  1. The data was filtered to leave only variables describing accelerators (except the outcome variable).
  2. The summarized data was analyzed.
>summary(training)

M<-abs(cor(training_sd[,-17]))
diag(M)<-0
which(M>0.8,arr.ind=TRUE)

Three things were noted:

> modFit_treebag
Bagged CART 

11776 samples
   15 predictor
    5 classes: 'A', 'B', 'C', 'D', 'E' 

No pre-processing
Resampling: Bootstrapped (25 reps) 
Summary of sample sizes: 11776, 11776, 11776, 11776, 11776, 11776, ... 
Resampling results

  Accuracy   Kappa      Accuracy SD  Kappa SD   
  0.8762355  0.8433116  0.005550699  0.007022629

The out-of-sample error was measured by percentage of predictions were incorrect on the testing sample. This value was ~10%.

> print(sum(pred_treebag!=testPC$classe)/length(testPC$classe))
[1] 0.09966862