Code Monkey home page Code Monkey logo

boyter / lc Goto Github PK

View Code? Open in Web Editor NEW
116.0 8.0 16.0 53.21 MB

licensechecker (lc) a command line application which scans directories and identifies what software license things are under producing reports as either SPDX, CSV, JSON, XLSX or CLI Tabular output. Dual-licensed under MIT or the UNLICENSE.

License: MIT License

Go 99.99% Shell 0.01%
golang go license spdx command-line-tool commandline cli licensechecker license-management classifier

lc's Introduction

licensechecker (lc)

NOTE - this is under heavy development, and as such master does not currently work, see a release for a working solution!

lc is a command line tool that recursively iterates over a supplied directory or file attempting to identify what software license each file is under using the list of licenses supplied by the SPDX (Software Package Data Exchange) Project. It will pick up license files named appropriately or inline licenses such as the below in source files

SPDX-License-Identifier: GPL-3.0-only

In a nutshell this project is a reimplementation of http://www.boyter.org/2017/05/identify-software-licenses-python-vector-space-search-ngram-keywords/ using Go while I attempt to nut out the nuances of the language.

It can produce report outputs as valid SPDX, CSV, XLSX, JSON and CLI formatted. It has been designed to work inside CI systems that capture either stdout or file artifacts.

Go Scc Count Badge

Dual-licensed under MIT or the UNLICENSE.

Support

Using lc commercially? If you want priority support for lc you can purchase a years worth https://boyter.gumroad.com/l/vixqn which entitles you to priority direct email support from the developer.

Why

In short taken from, http://ben.balter.com/licensee/

  • You've got an open source project. How do you know what you can and can't do with the software?
  • You've got a bunch of open source projects, how do you know what their licenses are?
  • You've got a project with a license file, but which license is it? Has it been modified?

Why should you care about what licenses your code runs under? See

Installation

The binary name for licencechecker is lc.

For binary files see releases https://github.com/boyter/lc/releases To build from source you need to have Go setup with your GOPATH working and your go binary path exported like so,

export PATH=$PATH:$(go env GOPATH)/bin

then to install

$ go install

Usage

Command line usage of licensechecker is designed to be as simple as possible. Full details can be found in lc --help.

$ lc --help
NAME:
   licensechecker - Check directory for licenses and list what license(s) a file is under

USAGE:
   lc [global options] [DIRECTORY|FILE] [DIRECTORY|FILE]

VERSION:
   1.3.0

COMMANDS:
     help, h  Shows a list of commands or help for one command

GLOBAL OPTIONS:
   --format csv, -f csv                                Set output format, supports progress, tabular, json, spdx, summary, xlsx or csv (default: "tabular")
   --output FILE, -o FILE                              Set output file if not set will print to stdout FILE
   --confidence 0.95, -c 0.95                          Set required confidence level for licence matching between 0 and 1 E.G. 0.95 (default: "0.85")
   --deepguess true, --dg true                         Should attempt to deep guess the licence false or true true (default: "true")
   --filesize 50000, --fs 50000                        How large a file in bytes should be processed 50000 (default: "50000")
   --licensefiles copying,readme, --lf copying,readme  Possible license files to inspect for over-arching license as comma seperated list copying,readme (default: "license,licence,copying,readme")
   --pathblacklist .git,.hg,.svn, --pbl .git,.hg,.svn  Which directories should be ignored as comma seperated list .git,.hg,.svn (default: ".git,.hg,.svn")
   --extblacklist gif,jpg,png, --xbl gif,jpg,png       Which file extensions should be ignored for deep analysis as comma seperated list E.G. gif,jpg,png (default: "woff,eot,cur,dm,xpm,emz,db,scc,idx,
mpp,dot,pspimage,stl,dml,wmf,rvm,resources,tlb,docx,doc,xls,xlsx,ppt,pptx,msg,vsd,chm,fm,book,dgn,blines,cab,lib,obj,jar,pdb,dll,bin,out,elf,so,msi,nupkg,pyc,ttf,woff2,jpg,jpeg,png,gif,bmp,psd,tif,tif
f,yuv,ico,xls,xlsx,pdb,pdf,apk,com,exe,bz2,7z,tgz,rar,gz,zip,zipx,tar,rpm,bin,dmg,iso,vcd,mp3,flac,wma,wav,mid,m4a,3gp,flv,mov,mp4,mpg,rm,wmv,avi,m4v,sqlite,class,rlib,ncb,suo,opt,o,os,pch,pbm,pnm,ppm
,pyd,pyo,raw,uyv,uyvy,xlsm,swf")
   --documentname LicenseChecker, --dn LicenseChecker  SPDX only. Sets DocumentName E.G. LicenseChecker (default: "Unknown")
   --packagename LicenseChecker, --pn LicenseChecker   SPDX only. Sets PackageName E.G. LicenseChecker (default: "Unknown")
   --documentnamespace value, --dns value              SPDX only. Sets DocumentNamespace, if not set will default to http://spdx.org/spdxdocs/[packagename]-[HASH]
   --help, -h                                          show help
   --version, -v                                       print the version

More information about what licensechecker looks at and how it works

Probably the most useful functionality is the -f modifier which specifies the output format. By default licencechecker will print out results in a tabular CLI format. However as it was designed to run at the end of CI tasks you may want to change it. This can be done like so.

$ lc -f tabular .
$ lc -f progress .
$ lc -f spdx .
$ lc -f csv .
$ lc -f summary .

The above will process starting in the current directory and print out a formatted list of results to the CLI when finished.

Example output of licencechecker running against itself in tabular format while ignoring the .git, licenses and vendor directories

$ lc -pbl .git,vendor,licenses -f tabular .
-----------------------------------------------------------------------------------------------------------
Directory            File                    License                                      Confidence  Size
-----------------------------------------------------------------------------------------------------------
.                    .gitignore              (MIT OR Unlicense)                           100.00%     278B
.                    .travis.yml             (MIT OR Unlicense)                           100.00%     192B
.                    CODE_OF_CONDUCT.md      (MIT OR Unlicense)                           100.00%     3.1K
.                    CONTRIBUTING.md         (MIT OR Unlicense)                           100.00%     1.2K
.                    Gopkg.lock              (MIT OR Unlicense)                           100.00%     1.4K
.                    Gopkg.toml              (MIT OR Unlicense)                           100.00%     972B
.                    LICENSE                 Unlicense AND MIT                            94.83%      1.1K
.                    README.md               (MIT OR Unlicense)                           100.00%     10.6K
.                    UNLICENSE               MIT AND Unlicense                            95.16%      1.2K
.                    database_keywords.json  (MIT OR Unlicense)                           100.00%     3.6M
.                    licensechecker.spdx     (MIT OR Unlicense)                           100.00%     9.3K
.                    main.go                 (MIT OR Unlicense)                           100.00%     3.4K
.                    what-we-look-at.md      (MIT OR Unlicense)                           100.00%     3.7K
examples/identifier  LICENSE                 GPL-3.0+ AND MIT                             95.40%      1K
examples/identifier  LICENSE2                MIT AND GPL-3.0+                             99.65%      35K
examples/identifier  has_identifier.py       (MIT OR GPL-3.0+) AND GPL-2.0                100.00%     409B
parsers              constants.go            (MIT OR Unlicense)                           100.00%     4.8M
parsers              formatter.go            (MIT OR Unlicense)                           100.00%     8.5K
parsers              formatter_test.go       (MIT OR Unlicense)                           100.00%     1.3K
parsers              guesser.go              (MIT OR Unlicense)                           100.00%     9.8K
parsers              guesser_test.go         (MIT OR Unlicense) AND GPL-2.0 AND GPL-3.0+  100.00%     4.8K
parsers              helpers.go              (MIT OR Unlicense) AND Apache-2.0            100.00%     2.4K
parsers              helpers_test.go         (MIT OR Unlicense)                           100.00%     2.8K
parsers              structs.go              (MIT OR Unlicense)                           100.00%     679B
scripts              build_database.py       (MIT OR Unlicense)                           100.00%     4.6K
scripts              include.go              (MIT OR Unlicense)                           100.00%     951B
-----------------------------------------------------------------------------------------------------------

To write out the results to a CSV file

$ lc --format csv -output licences.csv --pathblacklist .git,licenses,vendor .

Or to a SPDX 2.1 file

$ lc -f spdx -o licensechecker.spdx --pbl .git,vendor,licenses -dn licensechecker -pn licensechecker .

You can specify multiple directories as additional arguments and all results will be merged into a single output

$ lc -f tabular ./examples/identifier ./scripts

You can also specify files and directories as additional arguments

$ lc -f tabular README.md LICENSE ./examples/identifier
------------------------------------------------------------------------------------------
Directory              File               License                        Confidence  Size
------------------------------------------------------------------------------------------
                       README.md          NOASSERTION                    100.00%     11.3K
                       LICENSE            MIT                            94.83%      1.1K
./examples/identifier  LICENSE            GPL-3.0+ AND MIT               95.40%      1K
./examples/identifier  LICENSE2           MIT AND GPL-3.0+               99.65%      35K
./examples/identifier  has_identifier.py  (MIT OR GPL-3.0+) AND GPL-2.0  100.00%     409B
------------------------------------------------------------------------------------------

SPDX

The ouput of SPDX is a valid SPDX 2.1 document. Validation was checked against the tools supplied by the SPDX group. Running master against itself to produce a SPDX and the validating using the tools from https://github.com/spdx/tools

$ go run main.go  -f spdx -o spdx_example.spdx --pbl .git,vendor,licenses -dn licensechecker -pn licensechecker . && java -jar ./spdx-tools-2.1.12-SNAPSHOT-jar-with-dependencies.jar Verify ./spdx_example.spdx
ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console. Set system property 'log4j2.debug' to show Log4j2 internal initialization logging.
03:49:29.479 [main] ERROR org.apache.jena.rdf.model.impl.RDFReaderFImpl - Rewired RDFReaderFImpl - configuration changes have no effect on reading
03:49:29.482 [main] ERROR org.apache.jena.rdf.model.impl.RDFReaderFImpl - Rewired RDFReaderFImpl - configuration changes have no effect on reading
This SPDX Document is valid.

Package

Run go build for windows and linux then the following in linux, keep in mind need to update the version

zip -r9 lc-1.0.0-x86_64-pc-windows.zip lc.exe && zip -r9 lc-1.0.0-x86_64-unknown-linux.zip lc

GOOS=darwin GOARCH=amd64 go build && zip -r9 lc-1.0.0-x86_64-apple-darwin.zip lc
GOOS=windows GOARCH=amd64 go build && zip -r9 lc-1.0.0-x86_64-pc-windows.zip lc.exe
GOOS=linux GOARCH=amd64 go build && zip -r9 lc-1.0.0-x86_64-unknown-linux.zip lc

Most Common Software Licences

Source https://www.blackducksoftware.com/top-open-source-licenses

Source https://blog.sourced.tech/post/gld/pga-licenses.csv

Rank 	Open Source License 	                            %
1.      MIT License 	                                    38%
2.      GNU General Public License (GPL 2.0) 	            14%
3.      Apache License 2.0                                  13%
4.      ISC License 	                                    10%
5.      GNU General Public License (GNU) 3.0 	            6%
6.      BSD License 2.0 (3-clause, New or Revised) License  5%
7.      Artistic License (Perl)                             3%
8.      GNU Lesser General Public License (LGPL) 2.1 	    3%
9.      GNU Lesser General Public License (LGPL) 3.0 	    1%
10. 	Eclipse Public License (EPL) 	                    1%
11. 	Microsoft Public License                            1%
12. 	Simplified BSD License (BSD) 	                    1%
13. 	Code Project Open License 1.02 	                    < 1%
14. 	Mozilla Public License (MPL) 1.1                    < 1%
15. 	GNU Affero General Public License v3 or later 	    < 1%
16. 	Common Development and Distribution License (CDDL)  < 1%
17. 	DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE 	    < 1%
18. 	Microsoft Reciprocal License 	                    < 1%
19. 	Sun GPL with Classpath Exception v2.0 	            < 1%
20. 	zlib/libpng License 	                            < 1%

TODO

lc's People

Contributors

boyter avatar calidadesystems avatar dvrkps avatar pmundt avatar sschuberth avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lc's Issues

lc thinks dash license is BSD-4-Clause-UC

Directory                         File     License          Confidence  Size
masterdir/builddir/dash-0.5.9.1/  COPYING  BSD-4-Clause-UC  93.43%      2.6K

LICENSE FILE

Copyright (c) 1989-1994
	The Regents of the University of California.  All rights reserved.
Copyright (c) 1997 Christos Zoulas.  All rights reserved.
Copyright (c) 1997-2005
	Herbert Xu <[email protected]>.  All rights reserved.

This code is derived from software contributed to Berkeley by Kenneth Almquist.


Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
1. Redistributions of source code must retain the above copyright
   notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
   notice, this list of conditions and the following disclaimer in the
   documentation and/or other materials provided with the distribution.
3. Neither the name of the University nor the names of its contributors
   may be used to endorse or promote products derived from this software
   without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
SUCH DAMAGE.

mksignames.c:

This file is not directly linked with dash.  However, its output is.

Copyright (C) 1992 Free Software Foundation, Inc.

This file is part of GNU Bash, the Bourne Again SHell.

Bash is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free
Software Foundation; either version 2, or (at your option) any later
version.

Bash is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
for more details.

You should have received a copy of the GNU General Public License with
your Debian GNU/Linux system, in /usr/share/common-licenses/GPL, or with the
Debian GNU/Linux hello source package as the file COPYING.  If not,
write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330,
Boston, MA 02111 USA.

lc thinks Attic license is Sleepycat

This is a common occourence wheen there are lots of copyright lines

--------------------------------------------------------------------
Directory                       File     License    Confidence  Size
--------------------------------------------------------------------
masterdir/builddir/Attic-0.16/  LICENSE  Sleepycat  92.88%      1.4K
--------------------------------------------------------------------
Copyright (C) 2010-2014 Jonas Borgström <[email protected]>
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:

 1. Redistributions of source code must retain the above copyright
    notice, this list of conditions and the following disclaimer.
 2. Redistributions in binary form must reproduce the above copyright
    notice, this list of conditions and the following disclaimer in
    the documentation and/or other materials provided with the
    distribution.
 3. The name of the author may not be used to endorse or promote
    products derived from this software without specific prior
    written permission.

THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS
OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER
IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN
IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Apple M1 - preferred version unclear

README has a warning that master doesn't work, so I downloaded an (x86) release, v1.3.1, to my MacBook Air (M1).

It crashes with a "segmentation fault" right away

> ./lc --help
[1]    75506 segmentation fault  ./lc --help

If you are running this successfully on a M1 Mac I'd love suggestions or help or reports of success/failure.

panic when running the app

lc

no flags, in a source code folder with no license file

panic: runtime error: index out of range

goroutine 1 [running]:
github.com/boyter/lc/parsers.walkDirectory(0x5c4a12, 0x1, 0xc42004f908, 0x0, 0x0, 0x0, 0x564a6d, 0xc4200b8420)
	~/go/src/github.com/boyter/lc/parsers/guesser.go:249 +0xf3e
github.com/boyter/lc/parsers.Process()
	~/go/src/github.com/boyter/lc/parsers/guesser.go:343 +0x5b8
main.main.func1(0xc4200b8420, 0xc4200b8420, 0xc4200479b7)
	~/go/src/github.com/boyter/lc/main.go:89 +0x5e
github.com/boyter/lc/vendor/github.com/urfave/cli.HandleAction(0x58eda0, 0xaa4ae8, 0xc4200b8420, 0xc42005a1e0, 0x0)
	~/go/src/github.com/boyter/lc/vendor/github.com/urfave/cli/app.go:490 +0xd2
github.com/boyter/lc/vendor/github.com/urfave/cli.(*App).Run(0xc420082820, 0xc42000e250, 0x1, 0x1, 0x0, 0x0)
	~/go/src/github.com/boyter/lc/vendor/github.com/urfave/cli/app.go:264 +0x635
main.main()
	~/go/src/github.com/boyter/lc/main.go:93 +0x9c9

Perhaps this is related to not having a license in the root directory?

Go modules

Please migrate this package to use Go modules.

lc thinks celt license is BSD-2-Clause-NetBSD because of copyright

Directory                        File     License              Confidence  Size
masterdir/builddir/celt-0.11.3/  COPYING  BSD-2-Clause-NetBSD  92.39%      1.3K

with copyright lines removed

Directory                        File     License       Confidence  Size
masterdir/builddir/celt-0.11.3/  COPYING  BSD-2-Clause  93.40%      1.2K

LICENSE FILE

Copyright 2001-2009 Jean-Marc Valin, Timothy B. Terriberry,
                    CSIRO, and other contributors

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:

- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.

- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Use an open license

It seems a little paradoxical to have a tool to help manage licenses itself be under a particularly incompatible/restrictive license. :-)

lc thinks MIT is JSON if there are 2 copyright lines

with 2 copyright lines

File     License  Confidence  Size
masterdir/builddir/libmaa-1.4.2/doc/  LICENSE  JSON     92.85%      1.1K

with 1 copyright line

File     License  Confidence  Size
masterdir/builddir/libmaa-1.4.2/doc/  LICENSE  MIT      94.29%      1.1K

LICENSE file

Copyright (c) 1995-2002 Rik Faith <[email protected]>
Copyright (c) 2002-2018 Aleksey Cheusov <[email protected]>

Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:

The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

lc skips some SPDX-Identifiers

Hello,

I recently tried out your project and it works pretty well.
However when applying the tool to parts of the linux-kernel as test I noticed that there seems to be an issue where a lot of .cc and .h files do not get a license assigned. The Problem seems to be related to the way comments are marked in cpp opposed to go or python (characters after the license name).

How to reproduce

  • Create a file containing only:
    /* SPDX-License-Identifier: GPL-2.0 */
  • Use lc on that file. Result should be NOASSERTION

In contrast:

  • remove everything after GPL-2.0
  • Result should be GPL-2.0

Problem

  • Problem lies in parsers/guesser.go
  • The RegExp chooses GPL-2.0 */ instead of GPL-2.0
  • Comparing the License leads to false as the string are not equal

Solution (-Attempt)

  • Adapt the RegExp. Probably tricky? Or I just suck at them (which is a fact).
  • Use another string comparison method. I tested with strings.Contains which seems to work. I am not 100% sure however if this screws up some very similar named licenses. I didnt see any but there might be license names which completely contain other license name. This does however fix the minimal sample from above.

lc thinks nnn license is Mup

---------------------------------------------------------------
Directory                    File     License  Confidence  Size
---------------------------------------------------------------
masterdir/builddir/nnn-1.7/  LICENSE  Mup      91.43%      1.4K
---------------------------------------------------------------
Copyright (c) 2014-2016 Lazaros Koromilas <[email protected]>
Copyright (c) 2014-2016 Dimitris Papastamos <[email protected]>
Copyright (c) 2016-2018 Arun Prakash Jana <[email protected]>
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:

1. Redistributions of source code must retain the above copyright
   notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
   notice, this list of conditions and the following disclaimer in the
   documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

build fails on 32-bit platforms

$ GOARCH=386 go build
# github.com/boyter/lc/parsers
parsers/helpers.go:94:13: constant 1099511627776 overflows int

$ GOARCH=arm go build
# github.com/boyter/lc/parsers
parsers/helpers.go:94:13: constant 1099511627776 overflows int

Performance

The performance could be a lot better through the use of fan out. Might be possible to speed up the matching as well by using byte comparisons rather than string. Need to investigate both as the tool can be quite slow at times.

JSON output is invalid

JSON output is something like

[{Apache-2.0 0.9820416112200386} {ECL-2.0 0.9816067839861887} {CC-BY-NC-ND-3.0 0.9133121766992705} {CC-BY-NC-3.0 0.9132930147828987} {CC-BY-3.0 0.9115045535047256} {CC-BY-NC-SA-3.0 0.9111092719068973} {CC-BY-ND-3.0 0.9107656330507543} {CC-BY-SA-3.0 0.9076555881980038} {CC-BY-NC-2.5 0.9068909320860387} {CC-BY-NC-2.0 0.9050273550999453} {CC-BY-NC-SA-2.5 0.9043774260896892} {CC-BY-NC-SA-2.0 0.9029690938226045} {CC-BY-NC-ND-2.5 0.902856917776841} {RPL-1.1 0.9024687553399653} {OSL-2.1 0.9021600539592493} {CC-BY-NC-ND-2.0 0.9013231741145554} {OSL-3.0 0.9012500699047303} {CC-BY-NC-1.0 0.9011283469263592} {OSL-2.0 0.9010935813608286} {CC-BY-2.5 0.9001591349804541} {AFL-3.0 0.9000890602151572} {CC-BY-NC-SA-1.0 0.8997149521547787} {RPSL-1.0 0.8996518246913179} {NPOSL-3.0 0.8994258323278262} {OSL-1.1 0.8991596162122619} {CC-BY-2.0 0.8980891540798946} {CC-BY-SA-2.5 0.8978961638045134} {CC-BY-ND-2.5 0.8964275017374166} {CC-BY-SA-2.0 0.8961860584816612} {CC-BY-SA-1.0 0.8948504529310831} {CC-BY-ND-2.0 0.8941912407814255} {CC-BY-1.0 0.8935086935594703} {Watcom-1.0 0.892030153131385} {APL-1.0 0.8919370580758877} {AFL-2.1 0.8914185431694561} {CC-BY-NC-ND-1.0 0.8895559886703055} {OSL-1.0 0.8893766813693182} {AFL-2.0 0.8891941263749268} {APSL-2.0 0.8883139003604398} {OCLC-2.0 0.8847764289194198} {MPL-2.0 0.8833798926249776} {MPL-2.0-no-copyleft-exception 0.8833798926249776} {CC-BY-ND-1.0 0.8809103866050997} {EPL-2.0 0.8721849134808276} {SCEA 0.8550701488756585} {LPL-1.02 0.8550067529807} {Artistic-2.0 0.8547973447224115} {LPL-1.0 0.8542530658488672} {CPL-1.0 0.8509001155583861} {EPL-1.0 0.8505800953898698}]

This is not valid JSON. Key / value pairs in a map should be quoted and separated by a colon.

--output does not seem to work

Running

./lc --format json --output out.json LICENSE

still writes the JSON to STDOUT. While it claims "Results written to out.json", that file just contains an empty array like "[]".

Add summary formatter

I'd love to get a high level summary of the licenses, something like:

License       Quantity
MIT              20
ISC              5
GPL             1

Add line seperator

Need to modify the library we depend on, since this is a quick hack probably don't want to add a PR back to it yet.

88bc1c3

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.