Code Monkey home page Code Monkey logo

sin's Introduction

Sin

release coverage

Keep track of sinful license usage.

Sin (source inventory) collects license information from all input files using ScanCode and saves the results to a local database for incremental updates and further analysis. Sin helps you keep track of the licenses that your dependencies use, to make sure that you're not using anything unacceptable.

Features:

  • Incremental processing - Maintains a local SQLite database to make sure to only process new/moved/modified files, which greatly speeds up subsequent scans. Suitable for CI/CD.
  • Tool for investigation - CLI tool to show suspicious files and "accept" them to the database, either using global rules or specific exceptions.
  • Simple database - Manages a simple SQLite database that can be easily browsed or consumed by other tools.

Sin has been tested on a multi-repo codebase with over 300k files:

  • Initial scan: ~6 hours.
  • Subsequent scans: ~10 minutes (assuming that not many files have changed).

Docker images available on Docker Hub:

Usage

Prerequisites

  • Docker.
  • Everything you wish to scan, with all dependencies installed.

Input to Sin

Sin runs in a docker container, and uses the following directories inside:

  • /data/src. Sin assumes that this dir contains all files that you wish to scan, including installed dependencies. Make sure everything is installed and available in this dir. Can be mounted read-only. You could organize your files like this:
    • /data/src/repo1
    • /data/src/repo2
    • etc.
  • /data/db. Sin will maintain a file called db.sqlite in this dir. It will be created if it does not exist. It's a good idea to keep this file backed up, since the point is to use it over time.
  • /data/tmp (optional). Sin creates a timestamped workspace inside this dir every time it's invoked, where all temporary files and reports are stored. Mount this folder if you wish to expose these files to your host (useful for debugging, auditing etc.).

Example 1: Try it out

  1. Clone this repo and cd into it.
  2. Run: make install_local shell.
  3. Inside the container, run commands like:
    • sin.ts scan --verbose - Perform scan on "bogus" source code under ./examples.
    • sin.ts audit --print - Generate an "audit" file that lists suspicions.
    • sin.ts licenses allow 'apache-2.0' - Accept MIT license.
    • sin.ts audit --print - Audit again, this time ignoring everything under Apache 2.0.

Example 2: More realistic

Make sure the dirs to be mounted exist on the host:

# For the sake of the example, we create this. In a real world scenario, this
# might be the (existing) root of your source code.
mkdir -p ./sin-data/src

# Database will be stored here.
mkdir -p ./sin-data/db

# All temp files will be stored here.
mkdir -p ./sin-data/tmp

Then run a container with Sin:

docker run --interactive --tty --rm --init \
   --mount type="bind",source="$(PWD)/sin-data/db",target="/data/db",consistency="delegated" \
   --mount type="bind",source="$(PWD)/sin-data/tmp",target="/data/tmp",consistency="delegated" \
   --mount type="bind",source="$(PWD)/sin-data/src",target="/data/src",readonly \
   khueue/sin:1.0.0

The above command will place you inside a bash shell, allowing you to run the tool, sin.ts (where all subcommands accept the -h flag):

$ sin.ts
Usage: sin.ts [options] [command]

Collects license information from all input files using ScanCode
and saves the results to a local database for further analysis

Options:
  -h, --help                 display help for command

Commands:
  scan [options] [pattern]   Scan input and update database with license findings
  audit [options]            Generate report of suspicious files
  view <file_path>           View contents of a file
  accepted [options]         Generate report of all manually accepted files
  accept <pattern> <reason>  Mark suspicious files as accepted
  unaccept <pattern>         Un-mark previously accepted files so they appear suspicious again
  licenses                   Manage globally allowed licenses (applied on every audit)
  help [command]             display help for command

Limitations

  • There is currently no ARM support (because ScanCode does not support it).

Tips

  • The bulk of the scan time is spent running ScanCode. Give as many CPUs as you can to Docker, since ScanCode is very good at saturating every available CPU.

Auditing

The sin.ts audit tool gathers a report according to the following:

  • Fetch all files (from the database) that might mention licenses in any way:
    • When a license file is found (e.g. LICENSE), and it mentions only accepted licenses, then that whole folder (including subfolders) is excluded. The idea is: "this project seems to have an okay license, allow it."
    • When a non-license file is found, and it mentions only accepted licenses, exclude it.
  • The remainder is a set of files that needs looking into.

The audit tool accepts the following flags:

  • --verbose - Include the full ScanCode report for each file.
  • --print - Print the audit on screen (in addition to an out file).

Automatic Acceptance

XXX Wrong since 1.0.0:

The engine is configured to allow specific licenses, referenced by "Key" in:

These acceptances are stored in the database, applied on-the-fly on every sin.ts audit, and managed by sin.ts licenses. This means that it's simple to go back and forth with accepting and unaccepting licenses and then re-auditing as needed.

Examples:

sin.ts licenses list
sin.ts licenses allow 'bsd-new'
sin.ts licenses unallow 'bsd-new'

Manual Acceptance

When rules are not enough, we need to inspect individual projects and files, and take decisions from there. For these situations, files can be marked as "accepted" using the sin.ts accept tool.

Marking as "accepted" essentially sets a flag in the database for a particular file, omitting it from future audits. Important to know is that if the contents of a file that has been marked as accepted ever changes, that flag will be removed so that the file can start showing up in reports again.

It is possible to revert any accepts by running sin.ts unaccept.

Examples:

sin.ts accepted
sin.ts accept repo1/dir2/dir3/mit-and-gpl.txt 'This file seems fine'
sin.ts unaccept repo1/dir2/dir3/mit-and-gpl.txt
sin.ts accept 'repo1/dir2/dir3/%' 'This whole folder is okay'

Tips and Tricks

To help with your investigation, Sin always saves two additional things when it finds potential license findings:

  • The entire contents of the file. The file can be viewed by running sin.ts view <path> (which you can pipe to less). This is especially useful if the file in question is the result of a decompressed archive inside your dependency tree.
  • The ScanCode report for the file. This is shown when running an audit with the --verbose flag.

sin's People

Contributors

khueue avatar

Stargazers

 avatar Andrei Subbota avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.