ubervu / elasticboard Goto Github PK
View Code? Open in Web Editor NEWDashboard that aggregates relevant metrics for Open Source projects.
Home Page: http://elasticboard.mihneadb.net/landing.html
License: MIT License
Dashboard that aggregates relevant metrics for Open Source projects.
Home Page: http://elasticboard.mihneadb.net/landing.html
License: MIT License
Some list widget with an input in which you can enter a user's name.
The widget should call API_BASE/gabrielfalcao/lettuce/{{ login}}/issues_assigned to get the necessary data and then display those issues.
Unfortunately there aren't many assigned issues in lettuce, so just return a fake array of issue ids when testing -- return [1, 2, 3] here https://github.com/uberVU/elasticboard/blob/master/data_processor/queries.py#L144
This is more of an exploratory question.
We were wondering if it is possible to also track an organization/individual account.
Our usecase right now is 'how can we figure out if some of our forks are outdated?'. Outdated meaning that we've opened them, but haven't contributed to them and we should probably close them or actually do something about them.
ideas?
What percentage of all issues is every contributor assigned to.
cc @piatra
As a developer
I want to know what coding style and other methodologies I need to adhere
So that I can contribute to elasticboard
In order to make it easy for developers to work on this and for early adopters to try it out, we need to make it easy to ship and deploy.
Basically, there should be a way to quickly set up an isolated environment with all the deps taken care of where one can try out elasticboard - either locally, using some sort of a container or on a PaaS service like Heroku.
For development (contributors) purposes, the local aspect is more relevant - tighter feedback loop. I was thinking of going with docker but because they rely on OS-level containers (think LXC) the current version only runs on Linux. The natural solution would be Vagrant. Not sure if there's a better option.
For "checking-it-out" purposes we could set up a Heroku procfile or an install script (that someone would use to set this up on their own EC2 VM, for example).
Either way I think we shouldn't maintain too many options right now because we would end up working more on the setup process than on the 'product' itself.
The point is to have a generic dashboard that works with any kind of data. It should be easy to configure and deploy.
However, at least right now, it's really hard to determine automagically what bits of all the raw data are relevant to the user, how to display them and such. Because of this, I suggest we focus on getting this into a good shape based only on github data (while keeping the code decoupled), and see what patterns emerge. From there, we can move on to the eventual next step, of having something meta and generic.
@bogdans83, @aismail, how does this sound?
We need data about the repos we want to display in elasticboard. Github provides this data via events that are sent as post- hooks. In order to get this data, we need to set up a small web service (like [1], but smarter) that listens for these events, maybe maps them to a model and then dumps them into elasticsearch.
Other than that, in order to "bootstrap" ourselves, we can fetch all historic data using GitHub Archive [2] so that our users don't have to wait for things to happen in order to see something on the dashboard. This will also make it much easier for early adopters to check it out, since they can test it right away.
To sum up:
[1] https://github.com/github/github-services/blob/master/lib/services/web.rb
[2] http://www.githubarchive.org/
Issue X was solved/closed, code was merged in module Y. However, there have been N follow up commits on that module after this. This means that the initial fix was not entirely correct/safe.
The page load is fast but the charts & everything else takes a ridiculous amount of time to load (even when working locally). This should have high priority imo.
The github API for repo events is limited at 300 data points. In theory when you reach that you should receive a "rel=last" header and the existing code will stop polling. However, the github API does not do this (already opened a support ticket) so we need to manually limit the download to 300 data points.
Limit data_processor.github.dump_repo_events to 300 items.
We should be notified of branches that are no longer compatible with their upstream branch (master or project branch) whenever conflicts appear.
Just as the merge button works, but together on a board, maybe with alerts :)
I for one want to find out asap when my branch is no longer compatible with the one I want to merge it back into, to not waste my time branching out even further.
As an elasticboard user
I want to see a timeline of all the events that happen
So that I can get an overview about the most recent activity
Blazing fast on-boarding.
Use Vagrant and set up a docker instance. When docker will support other OSes we can just reuse the Dockerfile.
Right now we rely on data[0].
cc @piatra
Understand downsides of using an existing dashboard.
This is fuzzy:
Please set repo description to: "Dashboard that aggregates relevant metrics for Open Source projects."
Thanks!
Issues that haven't been "touched" (no events whatsoever) for at least 2 months.
We don't want to interleave JSON data in the raw log files.
Issues that have no milestone and no assignee.
Files that have been changed in the past month in more than one commit.
In order to get this data, we need to access the commit data API. Basically, for a push event, we have to iterate through the commits, acces the API URL for every one of them and store the reply as a "commitevent" document (see schema.md).
From there, just have to check the "filename" attribute for every entry inside the "files" array for the respective commitevents.
Delta between first commit in the branch/pull request and merge time.
Milestones for which the "time to ship" ( #20 ) estimate is later than the set deadline.
The only data we get about commit events right now is from within pushevent data. There's an array of commits that tells us which commits were added with a push. However, we need more data for our metrics - like files changed, lines of code added/removed etc (#13, #14). For this, we need to store all the individual commit data as commitdata documents.
One way would be: for every push event, go thorugh the commits array, request and store data for each commit. Other way - ask explicitly for the commit events[1] and then process them individually.
The individual commit data has to be stored as "commitdata"-type documents.
[1] https://api.github.com/repos/gabrielfalcao/lettuce/commits
Right now all events go into the same index, having the same type. We should find a way to separate them according to the action that triggered them (i.e. commit, push, comment, issue).
I'm thinking we could have one index with multiple types (one for every event kind). Not sure how to find out what events our listener receives, since GH's API[1] doesn't seem to have a field for an event's kind (or I haven't seen it).
In order to compute all the issue-based metrics it would be easier to process specific issue data[1][2] rather than live events.
We need a function like:
download_issues_data(path, owner, repo, open=True, since=None)
that saves all the json objects received in a file, one per line. They will also have to have a "type" property set to "issuedata".
[1] https://api.github.com/repos/gabrielfalcao/lettuce/issues
[2] http://developer.github.com/v3/issues/#list-issues-for-a-repository
Have a basic frontend that displays what data we have.
The github archive data has a type attribute that shows the type of an event. I haven't seen this attribute in the payloads we've been receiving.
Might have to add an extra parameter to the subscribe API request.
Will investigate.
Add filtering to the timeline
All time and monthly. So we can track the evolution.
Need to use events-listener/subscribe.py to add web hooks pointing to http://mihneadb.ubervu.com:5000 in order to collect some data.
Issues where the diff between the first commit and the last commit or present time is way bigger than the estimated points. Might be a long shot, but could be useful to make us take the estimation points more seriously. Could also work for project issues (aggregated estimation points)
Total count of events (any kind of events) per week/month.
Should find out for every month how many additons and deletions are made.
This info has to be gathered from the "commitevent" documents, similar to the way described in issue #13. The "stats" attribute in these documents has all the info we need.
Issues with most events in the last month.
By number of events in which they are involved. Optionally, drill down by type of event (comment, commit etc.).
Issues labelled "critical", "blocker" etc (have a dictionary that contains the usual names for the highest possible severity).
Right now everything is pretty much hardcoded for gabrielfalcao/lettuce
. We should have a select input that chooses the current repo from a set of available repos in the datastore.
There's the /available_repos endpoint in the API for this.
Right now the current implementation relies on parsing dump files updated via a cron, it would be nice to have a cleaner way to bring it github data to the es database
Create an es river (http://www.elasticsearch.org/blog/the-river/) which pulls data from github to es directly.
This issue is a stub.
Open issues that belong to a milestone, where milestone.deadline > now().
Milestone has X issues.
They are closed at a rate of Y / day.
=> at the current pace, milestone will be done in Z days.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.