Code Monkey home page Code Monkey logo

orange-opensource / floss-toolbox Goto Github PK

View Code? Open in Web Editor NEW
17.0 10.0 4.0 831 KB

A toolbox to help developers and open source referents to not waste their time with manual and boring tasks. Provides simple and light tools to make investigations in source code to look for hot data. Provides also primitives to manage GitHub and GitLab organizations.

License: Apache License 2.0

Shell 37.37% Ruby 19.82% Python 42.12% PHP 0.69%
hotwords ruby toolbox shell commits dco signed-off logs octokit bash

floss-toolbox's Introduction

Opened issues Apache 2.0 license Versions Still maintained Code size

Shell Python Ruby PHP

FLOSS Toolbox

Toolbox to help developers and open source referents to have cleaner projects in GitHub organizations, and more.

Toolbox is mainly written in Shell because this language is very efficient for files processing and provides a strong and rich standard API with cool primitives and nice performances due to system calls. It helps also to call system primitives easily. Contains also Ruby scripts. Ruby are shiny gems, I love them. Python is also used. And a bit of PHP because it is nice to use several languages we are not used to (stop the routine!). For these needs scripting is enough.

Environment

You should have mainly the following environments bellow, but have a look on each folder README:

  • Bash version 3.2.5
  • Ruby version 2.7.1
  • Python version 3.7

Project tree

There are 5 folders containing scripts and programs to make your life a bit easier:

  1. toolbox/diver contains scripts to scrap data in Git logs and histories, look for sensitive data in sources, etc ;
  2. toolbox/github contains scripts and programs to make requests to GitHub API so as to automate some actions ;
  3. toolbox/gitlab contains scripts and programs to make requests to GitLab API so as to automate some actions ;
  4. toolbox/LicensesInventory contains program to get licenses of third party components thanks to dependency manager files ;
  5. toolbox/utils contains scripts to generate texts and stuff like that.

Feel free to read each README available in all of the subdirectories listed above.

Dry run

To be sure you have a ready-to-run project, you can run the following dry-run command which will check if runtimes, third party tools and files are available.

bash dry-run.sh

About the repository

Renovate

Renovate is used to as to try to keep updated dependencies of the project. A renovate.json must be added at the project root with cofiguration details ; but the organization admins must enable it (through the admin console). By default Dependabot was enabled for this project but has been replaced by Renovate.

Gitleaks

Gitleaks is used so as to look for secrets and leak of sensitive data. A gitleaks.toml file has been placed at the project root, picked from the Gitleaks repository, to define rules. A gitleaks-action.yml is also defined to define the GitHub Action to call and some secrets to use to do so. The GITLEAKS_LICENSE is defined in the organization level, only the organization admins can make it visible to projects. This key (dedicated to organization) has been asked to the Gitleaks team and received gratefully from them.

DCO

The Developer Certificate of Origin is applied here thanks to a Probot bot. On pull requests all commits must be signed off. This control is processed in an action.

floss-toolbox's People

Contributors

dependabot[bot] avatar pylapp avatar renovate[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

floss-toolbox's Issues

GitHub - Find missing users

As a GItHub administrator who have a diffusion-list with emails of users
I want to know who is missing in this list or in the GitHub organization
So that I can update my list or my GitHub organization with missing users.

GitHub - Find dead projects

As a GItHub administrator,
I want to find dead or inactive for a long time projects (let's say: 1 year at least)
So that I can set them as "archived" (only manually?)

LicensesInventory - New release

Expose new version of LicensesInventory managing:

  • Cargo.lock
  • build.gradle
  • build.gradle.kts
  • package.json
  • go.mod
  • pubspec.yaml
  • Packge.swift

GitHub - Provides automated tools for organization administration

Use case 1: As an open source referent, I want to get all members of my GitHub organization, so that I can check who has joined it.

Use case 2: As an open source referent, I want to get members who don't have 2FA enabled, so that I can ask them to enable it.

Use case 3: As an open source referent, I want to get members of organization with "company" field undefined, so that I can check who has defined it or not.

Use case 4: As an open source referent, I want to get projects which don't have any assigned GitHub team, so that I can add teams for them.

Use case 5: As an open source referent, I want to get users which have undefined or hidden email, so that I can check who has defined it or not.

Use case 6: As an open source referent, I want to get users which may have not suitable fullname, so that I can check who has defined it or not.

Use case 7: As an open source referent, I want to get repositories with undefined licenses, so that I can contact people to define it.

Use case 8: As an open source referent, I want to get repositories which seems to be unconform with missing mandatory files, so that I can ask owners to update them.

Use case 9: As an open source referent, I want to get repositories which seems to be empty or have not enough files, so that I can ask owners to update them or check if they meet some issues.

Use case 10: As an open source referent, I want to get define permissions (push / write) for all contributors of all projects (except teams and organization owners), so that I can ensure there is no unexpected admins.

Use case 11: As an open source referent, I want to define permissions (push / write) for all teams of all projects, so that I can ensure the teams have expected permission.

GitLab - Find dead projects

As a GitLab administrator,
I want to find dead or inactive for a long time projects (let's say: 1 year at least)
So that I can set them as "archived" (only manually?)

GitHub - Filter users email addresses by regex

As a referent, I want to get all email addresses which match a specific pattern with a regular expression, so that I can check easily if there are some contributeurs from outside companies or afiliates.

Diver - Look for corporate copyrights in sources

As a GitHub administrator
I want to check if there are projects in my organisation without copyrights in source code
So that I can contact project maintainers to ask them to add copyrights if sources.

Pseudocode:

For My GitHub Organisation: Clone All Projects Repositories
For Each Clone Projects:
    For Each Source File In [go, java, kotlin, php, javascript, rust, swift, objective-c, c, c++, c#, typescript, ...]
        Get the Copyright Line(i.e. Line starting by "copyright')
        If There Are More Than One Line: Warning
        If There Is No Line: Warning
        If The Copyright Line Does Not Contain My Company Name: Warning
        If The Copyright Line Contains My Company Name: OK
    End For Each
End For Each

Diver - Look for large files

As a GitHub administrator,
I want to check if some repositories have "large" files (Docker image, big blobs, etc)
So that I can contact maintainers to ask to warn them

Input : something to get all repositories URL, and a size limit

Note: Add in wiki a sample code to run this feature for all repositories

GitLab - Look for leaks

As a GitLab administrator
I want to make runs of gitleaks tool
So that I can check if there are some projects with credentials leaks

Step to follow:

Get All Repositories From Organisation
For Each Repository: Clone it
For Each Clone:
        Run GitLeaks
        Build Report
End For Each
Build Final Report

Gitleaks command should be:

gitleaks detect --report-format json --report-path $PROJECT.gitleaks.json --source $PROJECT

(where $PROJECT is the name of the project clone directory)

Project - v3 refactoring

Today a lot of code is written twice.

In fact even if scripts are quite isolated and have been written quite fast to fill emergency needs, some code portions are duplicated, for example:

  • use of gh
  • clone of repositories
  • ...

In addition some files must be placed in "common" folder because they are both use in github side and gitlab side, e.g. Python parsers.

So, the v3 version of this project must be cleaner, more abstracted.

GitHub - State changes in repositories

As a GItHub administrator
I want to have list of repositories where state have changed
So that I can see if someone make updates with our without consent

Here state mean: teams, contributeurs, permissions

Diver - Lines of codes and useful metrics

As an open source referent,
I want to compute the number of lines of code
So that I can sent this information to colleagues who need it.

Input: Target project folder
Output: Plain text
Library: Cloc

Add an entry in the wiki to run this feature for all projects of organisation (add code sample).

Bug - Smudge error: Error downloading object (git clone)

Some error might occur during clone of git repositories and seems to to be maanged properly.

See bellow some anonymized logs (cleaned data are in [%%]):


Cloning (73 / 189) **[%GIT CLONE SSH URL%.]**..
Cloning into 'server-hardening-collection'...
remote: Enumerating objects: 696, done.
remote: Counting objects: 100% (155/155), done.
remote: Compressing objects: 100% (48/48), done.
remote: Total 696 (delta 124), reused 119 (delta 105), pack-reused 541
Receiving objects: 100% (696/696), 195.25 KiB | 1.22 MiB/s, done.
Resolving deltas: 100% (302/302), done.
Downloading **[%TAR GZ GPG FILE OF 100 MB%.]**
Error downloading object: **[%TAR GZ GPG FILE OF 100 MB%.]** (**[% COMMIT HASH %]**): Smudge error: Error downloading **[%TAR GZ GPG FILE OF 100 MB%.] **(**[% COMMIT HASH FULL%]**): batch response: Post **[%BATCH FILE URL%]**: proxyconnect tcp: dial tcp: lookup **[%PROXY%]:** no such host

Errors logged to **[%LOG FILE %]**
Use `git lfs logs last` to view the log.
error: external filter 'git-lfs filter-process' failed
fatal: **[% THE BIG GPG FILE%]** smudge filter lfs failed
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry with 'git restore --source=HEAD :/'

Cloned in folder '**[% PROJECT NAME %]**'

    ○
    │╲
    │ ○
    ○ ░
    ░    gitleaks

7:31PM INF no leaks found
7:31PM INF scan completed in 266.537697ms
**[%PATH%]**
✅ Gitleaks did not find leaks for 'server-hardening-collection'

Source code - Generate SPDX headers for sources

As an open source referent or a software developer, I want to get SPDX-formated headers for my sources, so that I can easily and automatically identify copyrights, licenses, years or authors for each project and make easy identification of such values through web portal like APP.

GitHub - Dump and diff of users

As a GitHub administrator,
I want to dump and diff users of all repositories
So that I can check if some unexpected users have been added or not, or have inconvenient permissions.

Dump and diff here means to have a record of users and permissions for each repo, to be able to load such records, and compare differences between iterations.

Inconvenient permissions here means to have to high privileges for example.

Unexpected users here means people outside the Group.

Pseudocode :

O = Organisation
OM = Organisation Members For O
OOC = Outside Collaborators For O

For Each Project P in O:
        PM = Get Membres For P 
        MPerm = Get Permissions Of PM

        For Each Membre M in PM:
                If M Is In OC:                                                // Partner, external contributor
                        Display Member And Warning
                Else If M Is Not In OM:                               // Someone who has left the Group but still in project
                        Display Member And Warning    
                Else:                                                            // Ok
                        Display Member    

By Display Member* it means display the alias of the member and its permission for this project.
Warning means use emoji or something visible to request attention

A JSON report can be produced in parallel of standard output, like:

[ /* Projects in array */
        {
                "project" : projectName,
                "warning": emoji,
                "members": [ /*Project members in arrays*/ 
                                {
                                                "alias": memberAlias,
                                                "company": memberCompany,
                                                "mail": memberEmail,
                                                "isOutsideOrganisation": yes/no,
                                                "isOutsideCollaboratorOfOrganisation": yes/no,
                                },
                                ...  
        }
        ...
]

Package manager - Extract from files downloaded dependencies

As a developer, I want the list of dependancies and their licenses extracted from package manager files, so that I can check if there are external librairies or not and why licenses are implied.

Package managers can be CocoaPods, Swift Package Manager, Gradle,...

Project - Smoke tests

As a GItHub administrator,
I want to trigger smoke tests
So that I can be sure my tool will be well working.

Smoke tests mean for exemple; check if network is up (with/without proxy), check if API are still working, check BASH / RUBY environments...

[Bug] Diver - Path variables not protected

Given the last 2.10.0 version and a workspace with spaces in its path because of directory names (e.g. "X - Foo Bar Wizz")
When I run the script extract-emails-from-history.sh
The final rm command and also some tests fail.

Project - Add missing files

Some files are misssing or can be improved:

Files to add

  • CONTRIBUTORS.md: For contributors
  • CITATION.cff (see this doc)
  • CODE OF CONDUCT
  • CODE OF CONFLICT
  • CODEOWNERS (see this doc)

Files to improve

  • AUTHORDS.MD file extension in lower case and quote Orange name
  • README.md (badges, tree for sections, ...)

GitHub - Auto backup

As a GItHub administrator,
I want to make auto backups of all repositories of the GitHub organisation I manage
So that I can prevent some suers to delete repositories

Project - Split dry run

As a developer
I want to have several dry-run scripts
So that I will be able to find issues in settings without being polluted by other warnings

GitLab - Auto backup

As a GitLab administrator,
I want to make auto backups of all repositories of the GitLab organisation I manage
So that I can prevent some users to delete repositories for example

Diver - Extract email addresses

As an open source referent,
I wan to extract from Git history the email adresses which appear in commits and build a CSV result file
So that I will be able to see who contributed to the project

Note we should sort email adresse by alphabetical order of thee domain name

GitHub - Look for leaks

As a GitHub administrator
I want to make runs of gitleaks tool
So that I can check if there are some projects with credentials leaks

Step to follow:

Get All Repositories From Organisation
For Each Repository: Clone it
For Each Clone:
        Run GitLeaks
        Build Report
End For Each
Build Final Report

Gitleaks command should be:

gitleaks detect --report-format json --report-path $PROJECT.gitleaks.json --source $PROJECT

(where $PROJECT is the name of the project clone directory)

Diver - Better management of path

Today scripts are quite isolated but use relative paths, so if we point to projects outside the garbage-name-data folder, scripts may fail.
Using only absolute paths may be better.

GitHub - Look for leaks and vulnerabilities with exclusion of projects

Use case

As a GItHub administrator
I want to look for GItLeaks and Dependabot alerts but excluding some projects
So that I can keep only alerts for specific projects

Details

Here excluding projects means archived projects.

API

  • Use GH to get projects
  • In JSON of GH request, get boolean value of archived field
  • If true: do not process the project (no clone nor assertions)

Definition of Done

  • Deal with GitLeaks feature
  • Deal with Dependabot feature
  • Update README
  • Update if needed dry run
  • Update Wiki
  • Update CHANGELOG
  • Update wizard

Project - Add GitHub DCO bot

As a repository admin
I want a robot checking DCO in commits
So that I will be able to find which commits, committers and authors did not apply it

See this app

Diver - Look for leaked file

As a GitHub administrator
I want to make analysis of source file to find specific files
So that I can check if there some projects with credentials leaks

By "specific files" it means files in a black-lis,t like "keystore.jks", "id.rsa", "id.rsa_pub"

Step to follow:

Get All Repositories From Organisation
For Each Repository: Clone it
For Each Clone:
        For Each File Name Regex In Blacklist
             Look Recursively For File In Directory
        End For Each
End For Each
Build Final Report

Note: Add entry in wiki to run this script in all repositories clones

Dependencies - Get repository and licenses

As a developer, I want to have a list of dependencies with for each of them the hypothetic GitHub repository, the licence and a kind of "score" of confidence about the supposition, so that I can save time of searches on the web

Project - Industrialize it

Maybe we should rewrite the project with TDD / BDD principles so as to industrialize the tool which is maintained in best effort.

Not sure these scripts will be efficient enough, secure and reliable for all platforms.

Project - Dry run

As a user, I want to be sure my tool is ready to be used (so that I can be sure I can run it).

Each script (Python, Ruby and Shell) must have a dry-run feature.
Thus the whole dry-run feature just has t run each scripts dry-run feature.

List of items to check (for example):

  • Shell, Ruby environments
  • commands (awk, sed, tr, cat, file CURL)
  • API keys
  • organization name
  • admins pseudos
  • other items in configuration.rb
  • networks
  • Octokit
  • git
  • gh
  • python3
  • gitleaks

Project - Command to purge logs

As an open source referent
I want a CLI command to delete all logs
So that I don't waste storage with old and useless log files

Git - Get authors identity

As an open source referent
I want to get for a Git repository the list of authors (first name, last name, email)
So that I can see who have contributed or not

Bug - Failure of git log not managed

Given I trigger the script extract-emails-from-history.sh on a repository without any commits
When the git log command is executed
The command and the scripts fail

The failure message is "fatal: your current branch 'master' does not have any commits yet" and the error code is 128

Solution: Check if there are commits and trap the failure signal:

if [ "$( git log --oneline -5 2>/dev/null | wc -l )" -eq 0 ]; then
    echo "Warning: Project '$git_based_project' is a git repository without any commit, that's weird"
    CleanFiles
    NormalExit
fi

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.