The elasticboard from ubervu

Graph/widget for issues_assigned_to

Some list widget with an input in which you can enter a user's name.
The widget should call API_BASE/gabrielfalcao/lettuce/{{ login}}/issues_assigned to get the necessary data and then display those issues.

Unfortunately there aren't many assigned issues in lettuce, so just return a fake array of issue ids when testing -- return [1, 2, 3] here https://github.com/uberVU/elasticboard/blob/master/data_processor/queries.py#L144

Find out if an organization's forks are outdated

This is more of an exploratory question.
We were wondering if it is possible to also track an organization/individual account.
Our usecase right now is 'how can we figure out if some of our forks are outdated?'. Outdated meaning that we've opened them, but haven't contributed to them and we should probably close them or actually do something about them.

ideas?

cc @aismail @mihneadb

Issue distribution among people

What percentage of all issues is every contributor assigned to.

[Dogfood] Set up public elasticsearch endpoint with [self] data.

Make cog popup more robust

allow clicking on the cog again after it's been opened
allow clicking somewhere else on the page to close the popup

cc @piatra

Add contributions guideline to the repo

Need

As a developer
I want to know what coding style and other methodologies I need to adhere
So that I can contribute to elasticboard

[Shipping container] Need a "one click" install & deploy solution

In order to make it easy for developers to work on this and for early adopters to try it out, we need to make it easy to ship and deploy.

Basically, there should be a way to quickly set up an isolated environment with all the deps taken care of where one can try out elasticboard - either locally, using some sort of a container or on a PaaS service like Heroku.

For development (contributors) purposes, the local aspect is more relevant - tighter feedback loop. I was thinking of going with docker but because they rely on OS-level containers (think LXC) the current version only runs on Linux. The natural solution would be Vagrant. Not sure if there's a better option.

For "checking-it-out" purposes we could set up a Heroku procfile or an install script (that someone would use to set this up on their own EC2 VM, for example).

Either way I think we shouldn't maintain too many options right now because we would end up working more on the setup process than on the 'product' itself.

towards the future

The point is to have a generic dashboard that works with any kind of data. It should be easy to configure and deploy.

However, at least right now, it's really hard to determine automagically what bits of all the raw data are relevant to the user, how to display them and such. Because of this, I suggest we focus on getting this into a good shape based only on github data (while keeping the code decoupled), and see what patterns emerge. From there, we can move on to the eventual next step, of having something meta and generic.

@bogdans83, @aismail, how does this sound?

Getting data in

We need data about the repos we want to display in elasticboard. Github provides this data via events that are sent as post- hooks. In order to get this data, we need to set up a small web service (like [1], but smarter) that listens for these events, maybe maps them to a model and then dumps them into elasticsearch.

Other than that, in order to "bootstrap" ourselves, we can fetch all historic data using GitHub Archive [2] so that our users don't have to wait for things to happen in order to see something on the dashboard. This will also make it much easier for early adopters to check it out, since they can test it right away.

To sum up:

we need an always-running web service that listens for events and dumps them to ES [need to investigate if we can afford to have this real-time or have periodical updates]
we need a script (or something more advanced, celery-ish task-based magic) that sets up the initial data from gh-archive

[1] https://github.com/github/github-services/blob/master/lib/services/web.rb
[2] http://www.githubarchive.org/

Problematic fixes - required follow-ups

Issue X was solved/closed, code was merged in module Y. However, there have been N follow up commits on that module after this. This means that the initial fix was not entirely correct/safe.

Interface load time

The page load is fast but the charts & everything else takes a ridiculous amount of time to load (even when working locally). This should have high priority imo.

Limit github data download to 300 events (as per API docs)

Need

The github API for repo events is limited at 300 data points. In theory when you reach that you should receive a "rel=last" header and the existing code will stop polling. However, the github API does not do this (already opened a support ticket) so we need to manually limit the download to 300 data points.

Solution

Limit data_processor.github.dump_repo_events to 300 items.

Broken/incompatible/out of date branches

We should be notified of branches that are no longer compatible with their upstream branch (master or project branch) whenever conflicts appear.

Just as the merge button works, but together on a board, maybe with alerts :)

I for one want to find out asap when my branch is no longer compatible with the one I want to merge it back into, to not waste my time branching out even further.

Timeline of all the events as they are happening

Need

As an elasticboard user
I want to see a timeline of all the events that happen
So that I can get an overview about the most recent activity

Bootstrap Application With Vagrant

Need

Blazing fast on-boarding.

Solution

Abstract

Use Vagrant and set up a docker instance. When docker will support other OSes we can just reuse the Dockerfile.

TODOs

Get familiar with Docker and setup ElasticSearch in Vagrant by reusing https://github.com/troygoode/docker-elasticsearch/blob/master/Dockerfile (there are so many options for setting ES up ...)
Setup python application

Burndown of issues

https://github.com/uberVU/elasticboard/wiki/Insights#wiki-burndown-of-issues-

Play nice when there's no available index

Right now we rely on data[0].

cc @piatra

Install Giraffe For Graphite

Need

Understand downsides of using an existing dashboard.

Deliverables

Contribute to giraffe (see open issues and pick one)

Solution

TODOs

Set up a graphite instance with a giraffe dashboard by using graphite fabric
Plot or point it to an existing graphite install
Configure a dashboard

Contributor count for a point in time - total, active

This is fuzzy:

what's a contributor? I'd say anybody who interacts with the repository in any way for now - comments on an issue, makes a commit, opens an issue.
what's an active contributor? How about has a contribution (see above) that's no further than one month away from the reference "point in time"

Update repo description

Please set repo description to: "Dashboard that aggregates relevant metrics for Open Source projects."

Thanks!

Zombie issues

Issues that haven't been "touched" (no events whatsoever) for at least 2 months.

Dump data to ES

investigate the requirement of a schema
use an ES py lib to send our data to ES
set up a cron job to add in new data daily

Event listener needs atomic writes

We don't want to interleave JSON data in the raw log files.

"Orphan" issues

Issues that have no milestone and no assignee.

"Hot" files

Files that have been changed in the past month in more than one commit.

In order to get this data, we need to access the commit data API. Basically, for a push event, we have to iterate through the commits, acces the API URL for every one of them and store the reply as a "commitevent" document (see schema.md).

From there, just have to check the "filename" attribute for every entry inside the "files" array for the respective commitevents.

Average time to ship for issue

Delta between first commit in the branch/pull request and merge time.

Milestones "in danger"

Milestones for which the "time to ship" ( #20 ) estimate is later than the set deadline.

Get commit data

The only data we get about commit events right now is from within pushevent data. There's an array of commits that tells us which commits were added with a push. However, we need more data for our metrics - like files changed, lines of code added/removed etc (#13, #14). For this, we need to store all the individual commit data as commitdata documents.

One way would be: for every push event, go thorugh the commits array, request and store data for each commit. Other way - ask explicitly for the commit events[1] and then process them individually.

The individual commit data has to be stored as "commitdata"-type documents.

[1] https://api.github.com/repos/gabrielfalcao/lettuce/commits

Distinguish between event types

Right now all events go into the same index, having the same type. We should find a way to separate them according to the action that triggered them (i.e. commit, push, comment, issue).

I'm thinking we could have one index with multiple types (one for every event kind). Not sure how to find out what events our listener receives, since GH's API[1] doesn't seem to have a field for an event's kind (or I haven't seen it).

[1] http://developer.github.com/v3/activity/events/types/

Get issues data

In order to compute all the issue-based metrics it would be easier to process specific issue data[1][2] rather than live events.

We need a function like:

download_issues_data(path, owner, repo, open=True, since=None)

that saves all the json objects received in a file, one per line. They will also have to have a "type" property set to "issuedata".

[1] https://api.github.com/repos/gabrielfalcao/lettuce/issues
[2] http://developer.github.com/v3/issues/#list-issues-for-a-repository

All the issues on which a milestone depends

Set up Kibana

Have a basic frontend that displays what data we have.

Check API request for subscribing to events

The github archive data has a type attribute that shows the type of an event. I haven't seen this attribute in the payloads we've been receiving.

Might have to add an extra parameter to the subscribe API request.

Will investigate.

Filterable timeline

Add filtering to the timeline

Popularity metrics - forks, stars, watches

All time and monthly. So we can track the evolution.

Subscribe to some [ubervu] repositories

Need to use events-listener/subscribe.py to add web hooks pointing to http://mihneadb.ubervu.com:5000 in order to collect some data.

Underestimated issues

Issues where the diff between the first commit and the last commit or present time is way bigger than the estimated points. Might be a long shot, but could be useful to make us take the estimation points more seriously. Could also work for project issues (aggregated estimation points)

Need

Right now the current implementation relies on parsing dump files updated via a cron, it would be nice to have a cleaner way to bring it github data to the es database

Proposed Solution

Create an es river (http://www.elasticsearch.org/blog/the-river/) which pulls data from github to es directly.

Notes

This issue is a stub.

ubervu / elasticboard Goto Github PK

elasticboard's People

Contributors

Stargazers

Watchers

Forkers

elasticboard's Issues

Need

Need

Solution

Need

Need

Solution

Abstract

TODOs

Need

Deliverables

Solution

TODOs

Need

Proposed Solution

Notes

Recommend Projects

Recommend Topics

Recommend Org