Keep track of sinful license usage.
Sin (source inventory) collects license information from all input files using ScanCode and saves the results to a local database for incremental updates and further analysis. Sin helps you keep track of the licenses that your dependencies use, to make sure that you're not using anything unacceptable.
Features:
- Incremental processing - Maintains a local SQLite database to make sure to only process new/moved/modified files, which greatly speeds up subsequent scans. Suitable for CI/CD.
- Tool for investigation - CLI tool to show suspicious files and "accept" them to the database, either using global rules or specific exceptions.
- Simple database - Manages a simple SQLite database that can be easily browsed or consumed by other tools.
Sin has been tested on a multi-repo codebase with over 300k files:
- Initial scan: ~6 hours.
- Subsequent scans: ~10 minutes (assuming that not many files have changed).
Docker images available on Docker Hub:
- Docker.
- Everything you wish to scan, with all dependencies installed.
Sin runs in a docker container, and uses the following directories inside:
/data/src
. Sin assumes that this dir contains all files that you wish to scan, including installed dependencies. Make sure everything is installed and available in this dir. Can be mounted read-only. You could organize your files like this:/data/src/repo1
/data/src/repo2
- etc.
/data/db
. Sin will maintain a file calleddb.sqlite
in this dir. It will be created if it does not exist. It's a good idea to keep this file backed up, since the point is to use it over time./data/tmp
(optional). Sin creates a timestamped workspace inside this dir every time it's invoked, where all temporary files and reports are stored. Mount this folder if you wish to expose these files to your host (useful for debugging, auditing etc.).
- Clone this repo and cd into it.
- Run:
make install_local shell
. - Inside the container, run commands like:
sin.ts scan --verbose
- Perform scan on "bogus" source code under ./examples.sin.ts audit --print
- Generate an "audit" file that lists suspicions.sin.ts licenses allow 'apache-2.0'
- Accept MIT license.sin.ts audit --print
- Audit again, this time ignoring everything under Apache 2.0.
Make sure the dirs to be mounted exist on the host:
# For the sake of the example, we create this. In a real world scenario, this
# might be the (existing) root of your source code.
mkdir -p ./sin-data/src
# Database will be stored here.
mkdir -p ./sin-data/db
# All temp files will be stored here.
mkdir -p ./sin-data/tmp
Then run a container with Sin:
docker run --interactive --tty --rm --init \
--mount type="bind",source="$(PWD)/sin-data/db",target="/data/db",consistency="delegated" \
--mount type="bind",source="$(PWD)/sin-data/tmp",target="/data/tmp",consistency="delegated" \
--mount type="bind",source="$(PWD)/sin-data/src",target="/data/src",readonly \
khueue/sin:1.0.0
The above command will place you inside a bash shell, allowing you to run
the tool, sin.ts
(where all subcommands accept the -h
flag):
$ sin.ts
Usage: sin.ts [options] [command]
Collects license information from all input files using ScanCode
and saves the results to a local database for further analysis
Options:
-h, --help display help for command
Commands:
scan [options] [pattern] Scan input and update database with license findings
audit [options] Generate report of suspicious files
view <file_path> View contents of a file
accepted [options] Generate report of all manually accepted files
accept <pattern> <reason> Mark suspicious files as accepted
unaccept <pattern> Un-mark previously accepted files so they appear suspicious again
licenses Manage globally allowed licenses (applied on every audit)
help [command] display help for command
- There is currently no ARM support (because ScanCode does not support it).
- The bulk of the scan time is spent running ScanCode. Give as many CPUs as you can to Docker, since ScanCode is very good at saturating every available CPU.
The sin.ts audit
tool gathers a report according to the following:
- Fetch all files (from the database) that might mention licenses in
any way:
- When a license file is found (e.g.
LICENSE
), and it mentions only accepted licenses, then that whole folder (including subfolders) is excluded. The idea is: "this project seems to have an okay license, allow it." - When a non-license file is found, and it mentions only accepted licenses, exclude it.
- When a license file is found (e.g.
- The remainder is a set of files that needs looking into.
The audit tool accepts the following flags:
--verbose
- Include the full ScanCode report for each file.--print
- Print the audit on screen (in addition to an out file).
XXX Wrong since 1.0.0:
The engine is configured to allow specific licenses, referenced by "Key" in:
These acceptances are stored in the database, applied on-the-fly on every
sin.ts audit
, and managed by sin.ts licenses
. This means that it's
simple to go back and forth with accepting and unaccepting licenses and then
re-auditing as needed.
Examples:
sin.ts licenses list
sin.ts licenses allow 'bsd-new'
sin.ts licenses unallow 'bsd-new'
When rules are not enough, we need to inspect individual projects and files,
and take decisions from there. For these situations, files can be marked as
"accepted" using the sin.ts accept
tool.
Marking as "accepted" essentially sets a flag in the database for a particular file, omitting it from future audits. Important to know is that if the contents of a file that has been marked as accepted ever changes, that flag will be removed so that the file can start showing up in reports again.
It is possible to revert any accepts by running sin.ts unaccept
.
Examples:
sin.ts accepted
sin.ts accept repo1/dir2/dir3/mit-and-gpl.txt 'This file seems fine'
sin.ts unaccept repo1/dir2/dir3/mit-and-gpl.txt
sin.ts accept 'repo1/dir2/dir3/%' 'This whole folder is okay'
To help with your investigation, Sin always saves two additional things when it finds potential license findings:
- The entire contents of the file. The file can be viewed by running
sin.ts view <path>
(which you can pipe to less). This is especially useful if the file in question is the result of a decompressed archive inside your dependency tree. - The ScanCode report for the file. This is shown when running an audit with
the
--verbose
flag.