Code Monkey home page Code Monkey logo

mobileanalysis's Introduction

MobileAnalysis

Some scripts and data files of my current project.

if you need it, plz email me first(xljroy#gmail)

mobileanalysis's People

Contributors

royxue avatar

Stargazers

zedoul avatar

Watchers

James Cloos avatar Julian Ramos avatar  avatar Denzil Ferreira avatar

mobileanalysis's Issues

Next steps

For future reference:

First, since what we got is so sparse first step is to simply merge categories. For instance we do not need to have every single game category in fact maybe we can just merge all the games into one category. So lets do it this way. Each one of us is going to group categories(with constraint of grouping only sparse categories which means a sparse category could join a non-sparse, however a non-sparse should not be considered for merge)

Second, we want to run a clustering algorithm on this data set. For that we need to find out the right number of clusters so we need to do the next: Run k-means for different sizes 2 to say a maximum of 20. Get for every clustering the silhouette score. Thus, we will get a trajectory so we will be able to see when can we expect to reach the right number of clusters. Also, we want to compute the gap statistic. Use the implementation in R which already does everything for you meaning you don't have to write the code for kmeans.

Third, check on the number of clusters we obtain. If reasonable then we can proceed to actually look at the centroids we obtain for that number of clusters and maybe even run different clustering algorithms.

Analysis Images

@julian-ramos
Hi Julian
Sorry for late reply, solving some stuff at home takes some time last week.
I uploaded some analysis result image based on category.
The image shows:
Category Total Duration
Category Total Times
Category Duration Per Time
Category Duration Percentage Per Time of Day
Category Time Percentage Per Time of Day
Category Duration Per Time Percentage Per Time of Day

later Im will upload the images based on user

Categories merging review

Lifestyle shouldn't be with health or medical.

Health and medical definitely together just call it health(things you have to do to make your life better)
LIfestyle category should contain Sports (things you do everyday to make your life better)
Libraries and demo is a separate category should not be merged with any other category
Finance category is fine meaning: finance together with shopping and business (selling buying stuff)
Entertainment category should contain: Music and audio, Photography, Media and Video, news and magazines and comics (self explanatory)
Personalization together with tools, live wallpaper, widgets, (personalization of your device)
Travel together with transportation (travel in general)
The games category is just all of the games together.
Communication alone
Education alone
Books and reference alone
Social alone
Weather alone
Unknown also alone

I think this way we have covered all of the categories. Sorry for all of the back and forth, this is the final categorization. You should get 15 categories in total, let me know if we left any category outside.

Clustering Result

In order to make what we do more accurate, I tried with different method.

  1. I use pam to get the silhouette score, which is "Partitioning (clustering) of the data into k clusters “around medoids”, a more robust version of K-means.". in that result it returns a good result with cluster number of 4.
  2. I tried to use original kmeans to get the result, even I make it loop 100 times, result is random, and not clearly enough to get best answer.
  3. for clusGap, first I uses the origin R function clusGap to get the result ALSO I tried another method to get clusGap, from https://github.com/echen/gap-statistic. The result from these 2 method are similar. so I think there is no problem with the clusGap result.

So I think if there are some unavailable situation that clusGap is not accurate, but there is nothing about this in the paper(uploaded as gap.pdf).

Whats your opinion?

Heatmap Results

About Heatmaps:

I didnt find a good way to combine these graphs together, heatmap function didnt work well with par(which is used for combine plots in R), so I saved the graphs as separate graph.

  1. data_plot.png is the origin data heat map plot.
  2. kmeans_heatmap.png: we knew that the result of kmeans algorithm is random, so in order to get a good heatmap graph, I set a boundary(0.36 from the kmeans graphs result we got before) when draw the heatmap to get a good graph.
  3. pam_heatmap.png: nothing special/
  4. dbscan_heatmap.png: Finding proper values of MinPts and E is important for DBScan algorithm, the main strategy I use is: first set a MinPts, and then change the E to get different results then pick up the value combination whose result most close to kmeans and pam. Here I use MinPts=7, and E=0.655.
  5. hclust_heatmap: Here I use euclidean to draw the Hierarchical Clustering heatmap

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.