Code Monkey home page Code Monkey logo

sardina's Introduction

🐟 S.A.R.D.I.N.A. 🐟

Statistiche Amabili Rendimento Degli Informatici Nell’Anno

What

A Python script to quickly compute how much we've worked in terms of:

  • Yearly commits
  • Contributors commits
  • Lines of code (LOC)
  • Language usage statistics

all four both per repository and total. And it also generates cool graphs!

combined stats graphs

All non-archived and non-disabled public repos are taken into consideration.

Where

You can try out this repository directly in the browser at this link!

How

Count our commits:

  • some 🐍 magic
  • GitHub APIs
  • the requests library

Count our SLOC (Source Lines Of Code):

  • git ls-files to list repo files
  • sed '/^\s*$/d' $file to remove whitespace-only lines
  • wc -l to count lines
    or, optionally
  • cloc - a dedicated utility to count lines of code

Why

For our yearly report and recruitment presentation.
Also, we are curious nerds.

I want to run it now!

First of all, generate a Personal Access Token (PAT) from your GitHub's developer settings page.
The token only needs access to the APIs so you can leave all the permission boxes unticked and generate a token that can only access your public information and has no control over your account, but still benefit from the 5000 API requests per hour of authenticated requests.

You can skip this step if you want and use the script without a PAT, but you will be subject to a limit of 60 API requests per hour, which means you could only fetch complete statistics for an account with at most 30 repos (we have 32 at the moment, so a PAT is highly recommended).

The configuration is done in config.py. There you can paste your PAT generated at the previous step, and configure for which owner you want to see the stats (either a user or an organization), where you want to save the output, and if you want to run the script in development mode.

git clone https://github.com/weee-open/sardina
cd sardina

# Optional: if you want to use a virtual environment
python3 -m venv venv
source venv/bin/activate

pip install -r requirements.txt
vim config.py
./main.py

# If you opted to use a virtual environment
deactivate

Command line options

./main.py --help                               
usage: main.py [-h] [--cloc | --wc] [--commits | --no-commits] [--sloc | --no-sloc] [--graphs | --no-graphs] [--lang | --no-lang] [-p]

S.A.R.D.I.N.A. - Statistiche Amabili Rendimento Degli Informatici Nell'Anno

optional arguments:
-h, --help    show this help message and exit
-p, --ping    Re-trigger stats generation on GitHub servers. Useful with cron.

Software to use to count lines of code:
--cloc        Use CLOC to count SLOC.
--wc          Use WC to count SLOC.

Count contributions to all repositories:
--commits     Count commits.
--no-commits  Do not count commits.

Count SLOC (source lines of code) of repositories:
--sloc        Count SLOC.
--no-sloc     Do not count SLOC.

Generate graphs for the gathered statistics:
--graphs      Generate graphs.
--no-graphs   Do not generate graphs.

Generate language usage statistics:
--lang        Generate langauge statistics.
--no-lang     Do not generate language statistics.

Language usage statistics

If SLOC are being counted and cloc is being used for that task, language statistics are always generated using CLOC itself independently of the --lang or --no-lang command line option (in this scenario no prompt is presented in interactive mode either). If SLOC are not being counted or wc is being used for that task, then language statistics are generated using GitHub's APIs only if the --lang option is specified. This happens because cloc is much more precise in counting the language usage than GitHub's APIs, and also is free in terms of API requests and Internet usage.

Development

Having to make all the necessary requests and clone all the repositories in order to test changes to the program is long, makes having a stable internet connection a requirement and hammers GitHub's servers with unnecessary requests. Therefore we included a couple of options into config.py that can make a developer's job simpler:

  • dev_mode: enables local caching of all GitHub API responses (list of repos, contributions and other statistics)
  • keep_repos: enables long-term storage of cloned repositories instead of deleting them after each run. Keep in mind your available storage!

sardina's People

Contributors

alecello avatar e-caste avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

sardina's Issues

Add command line options

Keep the current user prompts in the main, but also allow the user to give some command line arguments in order to spare them 4 user prompts every time they run this program.

See peracotta's main for inspiration.

Allow excluding some repos

For example in our case the WEEE-Open repo is useless since its commits are automatic.

Possible implementation:
... (-X|--exclude) WEEE-Open,other-repo,...

Add requirements.txt and update README

Since we're now using more than 1 library, add a requirements.txt with pip freeze > requirements.txt and replace the line in the README that says to install requests with pip install -r requirements.txt

Ignore files with cloc

Find an option for cloc to ignore the files contained in ignored_files.py. Also add bootstrap-dark.css to the ignored files.

Add stats about lines by developer

The get_contributors_commits_stats HTTP request also gets data about lines added and removed (diff) by developers. Add these stats to the output and the graphs of #3

Ignore PSUtap files

Is it really our second biggest software project?
We should find out what file extensions to exclude from our SLOC count.

Fix KeyError

When using the options --no-commits --sloc --graphs the graph generation fails because it doesn't have data about commits.
Traceback:

Generating graphs...
Traceback (most recent call last):
  File "./main.py", line 624, in <module>
    main()
  File "./main.py", line 618, in main
    print_all_stats(commits_stats, lines_stats, contributors_stats, use_cloc, generate_graphs)
  File "./main.py", line 475, in print_all_stats
    generate_figure([repo_commits[graph], yearly_repo_commits[graph], sloc_by_repo[graph]], os.path.join(graph_dir, f'{graph}.svg'))
KeyError: 'ansible-pxe'

'ansible-pxe' is the key that fails because it is the first repo when sorting alphabetically.

Make graph proportions deterministic

Right now graph dimensions are determined with a constant value for pie charts and a value proportional to the number of bars for bar charts, with a completely arbitrary value added to them in order to account for graph title and axis name.

While this is good enough, it generates graphs for which the single column width is not of a fixed size, graphs in which the title space is tight or too large, and appearance variations which are undesirable. In some edge cases, it might even create an overlapped mess of colors and text.

Ideally, dimensions should be determined as follows:

  • Accept an aspect ratio from the caller.
  • Accept a DPI value from the caller.
  • Accept title, legend and axis label font sizes.
  • Accept a class width in pixels
  • Accept a class spacing in pixels

Then use all the previous values to compute exact dimensions for:

  • Figure
  • Plot area
  • Graph area
  • Title area
  • Axis label area
  • Legend

The figure size should be the least value that can contain all its children.

The plot area (graph + title + legend + axis) should be the least value that can contain all its children, plus some whitespace on either side to have the aspect ratio match (unless the ratio is zero, in which case this last step is skipped).

The graph area should be calculated by using class and spacing values. For pie charts, the class is just one, for bar charts it's the number of bars. Spacing is in between classes (if there are many) and above the first class and below the last class.

This way the graph should be completely deterministic in aspect no matter the datasize or graph type.

Select pie chart colors randomly

Instead of having fixed colors, the pie charts could use some variety.
In particular, we need to keep the same color order (since it allows for a readable chart), but we can randomize the first color of the list (e.g. normally "Pink", "Blue", "Red" can also become "Blue", "Red", "Pink" and "Red", "Pink", "Blue", but never change order and become "Blue", "Pink", "Red").

Fix UnboundLocalError

When running with only --ping option, after correctly getting stats about commits, the following error appears:

Traceback (most recent call last):
  File "/root/sardina/main.py", line 754, in <module>
    main()
  File "/root/sardina/main.py", line 745, in main
    language_total, language_repo = get_language_stats(repos, header) if (get_languages and not use_cloc) else (None, None)
UnboundLocalError: local variable 'get_languages' referenced before assignment

Fix REST API credentials

Using curl -H "Accept: application/vnd.github.v3+json" -H "Authorization: token <PAT>" https://api.github.com/rate_limit returns

{
  "message": "Bad credentials",
  "documentation_url": "https://docs.github.com/rest"
}

We need to fix the way we provide an authorization (currently using my PAT) to the APIs.

Add graphs to present data in interesting way

We can use the matplotlib library to plot some interesting graphs, such as:

  • pie chart with percentage of commits (per repo, total this year, total all time)
  • histogram with lines of code to compare repos
  • other ideas? Feel free to implement them

The graphs should also be saved to a file.

Add development mode

Read dev_mode value from config file.

The development mode allows to save all the json requests contents to a json file per request and consequently reads it the at the next execution to save time.
This mode also skips the removal of the downloaded repos to save time for the lines stats section.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.