Some scripts and data files of my current project.
if you need it, plz email me first(xljroy#gmail)
@julian-ramos
Hi, Julian
I just uploaded the final result table
For future reference:
First, since what we got is so sparse first step is to simply merge categories. For instance we do not need to have every single game category in fact maybe we can just merge all the games into one category. So lets do it this way. Each one of us is going to group categories(with constraint of grouping only sparse categories which means a sparse category could join a non-sparse, however a non-sparse should not be considered for merge)
Second, we want to run a clustering algorithm on this data set. For that we need to find out the right number of clusters so we need to do the next: Run k-means for different sizes 2 to say a maximum of 20. Get for every clustering the silhouette score. Thus, we will get a trajectory so we will be able to see when can we expect to reach the right number of clusters. Also, we want to compute the gap statistic. Use the implementation in R which already does everything for you meaning you don't have to write the code for kmeans.
Third, check on the number of clusters we obtain. If reasonable then we can proceed to actually look at the centroids we obtain for that number of clusters and maybe even run different clustering algorithms.
@julian-ramos
Hi Julian
Sorry for late reply, solving some stuff at home takes some time last week.
I uploaded some analysis result image based on category.
The image shows:
Category Total Duration
Category Total Times
Category Duration Per Time
Category Duration Percentage Per Time of Day
Category Time Percentage Per Time of Day
Category Duration Per Time Percentage Per Time of Day
later Im will upload the images based on user
Lifestyle shouldn't be with health or medical.
Health and medical definitely together just call it health(things you have to do to make your life better)
LIfestyle category should contain Sports (things you do everyday to make your life better)
Libraries and demo is a separate category should not be merged with any other category
Finance category is fine meaning: finance together with shopping and business (selling buying stuff)
Entertainment category should contain: Music and audio, Photography, Media and Video, news and magazines and comics (self explanatory)
Personalization together with tools, live wallpaper, widgets, (personalization of your device)
Travel together with transportation (travel in general)
The games category is just all of the games together.
Communication alone
Education alone
Books and reference alone
Social alone
Weather alone
Unknown also alone
I think this way we have covered all of the categories. Sorry for all of the back and forth, this is the final categorization. You should get 15 categories in total, let me know if we left any category outside.
In order to make what we do more accurate, I tried with different method.
So I think if there are some unavailable situation that clusGap is not accurate, but there is nothing about this in the paper(uploaded as gap.pdf).
Whats your opinion?
I didnt find a good way to combine these graphs together, heatmap function didnt work well with par(which is used for combine plots in R), so I saved the graphs as separate graph.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.