royxue / mobileanalysis Goto Github PK

View Code? Open in Web Editor NEW

1.0 4.0 1.0 6.57 MB

R 11.27% Python 85.94% Rebol 2.79%

mobileanalysis's Introduction

MobileAnalysis

Some scripts and data files of my current project.

if you need it, plz email me first(xljroy#gmail)

mobileanalysis's People

Contributors

Stargazers

Watchers

mobileanalysis's Issues

final result table uploaded

@julian-ramos
Hi, Julian
I just uploaded the final result table

First, since what we got is so sparse first step is to simply merge categories. For instance we do not need to have every single game category in fact maybe we can just merge all the games into one category. So lets do it this way. Each one of us is going to group categories(with constraint of grouping only sparse categories which means a sparse category could join a non-sparse, however a non-sparse should not be considered for merge)

Second, we want to run a clustering algorithm on this data set. For that we need to find out the right number of clusters so we need to do the next: Run k-means for different sizes 2 to say a maximum of 20. Get for every clustering the silhouette score. Thus, we will get a trajectory so we will be able to see when can we expect to reach the right number of clusters. Also, we want to compute the gap statistic. Use the implementation in R which already does everything for you meaning you don't have to write the code for kmeans.

Third, check on the number of clusters we obtain. If reasonable then we can proceed to actually look at the centroids we obtain for that number of clusters and maybe even run different clustering algorithms.

Analysis Images

@julian-ramos
Hi Julian
Sorry for late reply, solving some stuff at home takes some time last week.
I uploaded some analysis result image based on category.
The image shows:
Category Total Duration
Category Total Times
Category Duration Per Time
Category Duration Percentage Per Time of Day
Category Time Percentage Per Time of Day
Category Duration Per Time Percentage Per Time of Day

later Im will upload the images based on user

Categories merging review

Lifestyle shouldn't be with health or medical.

Health and medical definitely together just call it health(things you have to do to make your life better)
LIfestyle category should contain Sports (things you do everyday to make your life better)
Libraries and demo is a separate category should not be merged with any other category
Finance category is fine meaning: finance together with shopping and business (selling buying stuff)
Entertainment category should contain: Music and audio, Photography, Media and Video, news and magazines and comics (self explanatory)
Personalization together with tools, live wallpaper, widgets, (personalization of your device)
Travel together with transportation (travel in general)
The games category is just all of the games together.
Communication alone
Education alone
Books and reference alone
Social alone
Weather alone
Unknown also alone

I think this way we have covered all of the categories. Sorry for all of the back and forth, this is the final categorization. You should get 15 categories in total, let me know if we left any category outside.

Clustering Result

In order to make what we do more accurate, I tried with different method.

I use pam to get the silhouette score, which is "Partitioning (clustering) of the data into k clusters “around medoids”, a more robust version of K-means.". in that result it returns a good result with cluster number of 4.
I tried to use original kmeans to get the result, even I make it loop 100 times, result is random, and not clearly enough to get best answer.
for clusGap, first I uses the origin R function clusGap to get the result ALSO I tried another method to get clusGap, from https://github.com/echen/gap-statistic. The result from these 2 method are similar. so I think there is no problem with the clusGap result.

So I think if there are some unavailable situation that clusGap is not accurate, but there is nothing about this in the paper(uploaded as gap.pdf).

Whats your opinion?

Heatmap Results

About Heatmaps:

I didnt find a good way to combine these graphs together, heatmap function didnt work well with par(which is used for combine plots in R), so I saved the graphs as separate graph.

data_plot.png is the origin data heat map plot.
kmeans_heatmap.png: we knew that the result of kmeans algorithm is random, so in order to get a good heatmap graph, I set a boundary(0.36 from the kmeans graphs result we got before) when draw the heatmap to get a good graph.
pam_heatmap.png: nothing special/
dbscan_heatmap.png: Finding proper values of MinPts and E is important for DBScan algorithm, the main strategy I use is: first set a MinPts, and then change the E to get different results then pick up the value combination whose result most close to kmeans and pam. Here I use MinPts=7, and E=0.655.
hclust_heatmap: Here I use euclidean to draw the Hierarchical Clustering heatmap

royxue / mobileanalysis Goto Github PK

mobileanalysis's Introduction

MobileAnalysis

mobileanalysis's People

Contributors

Stargazers

Watchers

mobileanalysis's Issues

final result table uploaded

Next steps

Analysis Images

Categories merging review

Clustering Result

Heatmap Results

About Heatmaps:

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent