Code Monkey home page Code Monkey logo

elasticboard's People

Contributors

adaschevici avatar mihneadb avatar mishu- avatar piatra avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

elasticboard's Issues

Find out if an organization's forks are outdated

This is more of an exploratory question.
We were wondering if it is possible to also track an organization/individual account.
Our usecase right now is 'how can we figure out if some of our forks are outdated?'. Outdated meaning that we've opened them, but haven't contributed to them and we should probably close them or actually do something about them.

ideas?

cc @aismail @mihneadb

[Shipping container] Need a "one click" install & deploy solution

In order to make it easy for developers to work on this and for early adopters to try it out, we need to make it easy to ship and deploy.

Basically, there should be a way to quickly set up an isolated environment with all the deps taken care of where one can try out elasticboard - either locally, using some sort of a container or on a PaaS service like Heroku.

For development (contributors) purposes, the local aspect is more relevant - tighter feedback loop. I was thinking of going with docker but because they rely on OS-level containers (think LXC) the current version only runs on Linux. The natural solution would be Vagrant. Not sure if there's a better option.

For "checking-it-out" purposes we could set up a Heroku procfile or an install script (that someone would use to set this up on their own EC2 VM, for example).

Either way I think we shouldn't maintain too many options right now because we would end up working more on the setup process than on the 'product' itself.

towards the future

The point is to have a generic dashboard that works with any kind of data. It should be easy to configure and deploy.

However, at least right now, it's really hard to determine automagically what bits of all the raw data are relevant to the user, how to display them and such. Because of this, I suggest we focus on getting this into a good shape based only on github data (while keeping the code decoupled), and see what patterns emerge. From there, we can move on to the eventual next step, of having something meta and generic.

@bogdans83, @aismail, how does this sound?

Getting data in

We need data about the repos we want to display in elasticboard. Github provides this data via events that are sent as post- hooks. In order to get this data, we need to set up a small web service (like [1], but smarter) that listens for these events, maybe maps them to a model and then dumps them into elasticsearch.

Other than that, in order to "bootstrap" ourselves, we can fetch all historic data using GitHub Archive [2] so that our users don't have to wait for things to happen in order to see something on the dashboard. This will also make it much easier for early adopters to check it out, since they can test it right away.

To sum up:

  • we need an always-running web service that listens for events and dumps them to ES [need to investigate if we can afford to have this real-time or have periodical updates]
  • we need a script (or something more advanced, celery-ish task-based magic) that sets up the initial data from gh-archive

[1] https://github.com/github/github-services/blob/master/lib/services/web.rb
[2] http://www.githubarchive.org/

Problematic fixes - required follow-ups

Issue X was solved/closed, code was merged in module Y. However, there have been N follow up commits on that module after this. This means that the initial fix was not entirely correct/safe.

Interface load time

The page load is fast but the charts & everything else takes a ridiculous amount of time to load (even when working locally). This should have high priority imo.

Limit github data download to 300 events (as per API docs)

Need

The github API for repo events is limited at 300 data points. In theory when you reach that you should receive a "rel=last" header and the existing code will stop polling. However, the github API does not do this (already opened a support ticket) so we need to manually limit the download to 300 data points.

Solution

Limit data_processor.github.dump_repo_events to 300 items.

Broken/incompatible/out of date branches

We should be notified of branches that are no longer compatible with their upstream branch (master or project branch) whenever conflicts appear.

Just as the merge button works, but together on a board, maybe with alerts :)

I for one want to find out asap when my branch is no longer compatible with the one I want to merge it back into, to not waste my time branching out even further.

Contributor count for a point in time - total, active

This is fuzzy:

  • what's a contributor? I'd say anybody who interacts with the repository in any way for now - comments on an issue, makes a commit, opens an issue.
  • what's an active contributor? How about has a contribution (see above) that's no further than one month away from the reference "point in time"

Update repo description

Please set repo description to: "Dashboard that aggregates relevant metrics for Open Source projects."

Thanks!

Zombie issues

Issues that haven't been "touched" (no events whatsoever) for at least 2 months.

Dump data to ES

  • investigate the requirement of a schema
  • use an ES py lib to send our data to ES
  • set up a cron job to add in new data daily

"Hot" files

Files that have been changed in the past month in more than one commit.

In order to get this data, we need to access the commit data API. Basically, for a push event, we have to iterate through the commits, acces the API URL for every one of them and store the reply as a "commitevent" document (see schema.md).

From there, just have to check the "filename" attribute for every entry inside the "files" array for the respective commitevents.

Get commit data

The only data we get about commit events right now is from within pushevent data. There's an array of commits that tells us which commits were added with a push. However, we need more data for our metrics - like files changed, lines of code added/removed etc (#13, #14). For this, we need to store all the individual commit data as commitdata documents.

One way would be: for every push event, go thorugh the commits array, request and store data for each commit. Other way - ask explicitly for the commit events[1] and then process them individually.

The individual commit data has to be stored as "commitdata"-type documents.

[1] https://api.github.com/repos/gabrielfalcao/lettuce/commits

Distinguish between event types

Right now all events go into the same index, having the same type. We should find a way to separate them according to the action that triggered them (i.e. commit, push, comment, issue).

I'm thinking we could have one index with multiple types (one for every event kind). Not sure how to find out what events our listener receives, since GH's API[1] doesn't seem to have a field for an event's kind (or I haven't seen it).

[1] http://developer.github.com/v3/activity/events/types/

Set up Kibana

Have a basic frontend that displays what data we have.

Check API request for subscribing to events

The github archive data has a type attribute that shows the type of an event. I haven't seen this attribute in the payloads we've been receiving.

Might have to add an extra parameter to the subscribe API request.

Will investigate.

Underestimated issues

Issues where the diff between the first commit and the last commit or present time is way bigger than the estimated points. Might be a long shot, but could be useful to make us take the estimation points more seriously. Could also work for project issues (aggregated estimation points)

Monthly code flow - lines changed (additions, deletions)

Should find out for every month how many additons and deletions are made.

This info has to be gathered from the "commitevent" documents, similar to the way described in issue #13. The "stats" attribute in these documents has all the info we need.

Hot issues

Issues with most events in the last month.

Most active contributors

By number of events in which they are involved. Optionally, drill down by type of event (comment, commit etc.).

Critical issues

Issues labelled "critical", "blocker" etc (have a dictionary that contains the usual names for the highest possible severity).

Don't hardcode elasticboard to just one repo

Right now everything is pretty much hardcoded for gabrielfalcao/lettuce. We should have a select input that chooses the current repo from a set of available repos in the datastore.

There's the /available_repos endpoint in the API for this.

Overdue issues

Open issues that belong to a milestone, where milestone.deadline > now().

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.