spdx / tools-python Goto Github PK
View Code? Open in Web Editor NEWA Python library to parse, validate and create SPDX documents.
Home Page: http://spdx.org
License: Apache License 2.0
A Python library to parse, validate and create SPDX documents.
Home Page: http://spdx.org
License: Apache License 2.0
Steps to reproduce:
FileCopyrightText: Copyright 2014 Acme Inc
in SPDXSimpleTag.tagexamples
directory.python parse_tv.py '../data/SPDXSimpleTag.tag'
Generating LALR tables
FileCopyrightText must be one of NOASSERTION, NONE or free form text, line: 34
Errors encountered while parsing
Following the pattern of other SPDX repositories,this one too should have a contributing.md file.This should invite even more contributors.
This is originally reported by @yadsharaf in nexB/scancode-toolkit#692
In the README, How to use section.
parse_tv.py
is an example tag/value parsing usage. Try runningpython parse_tv.py '../data/SPDXSimpleTag.tag'
args '../data/SPDXSimpleTag.tag'
should remove single quotes cause they are not valid path in Windows
The readme points to the old account of ah450.I don't know if it was intended to be that way,but we should also add a link for his new account as well.
Consider using the spdx-lookup package instead of the committed spdx_licenselist.csv file.
When parsing documents from RDF files, some fields are being stored as rdflib objects and it can be problematic.
All data stored in SPDX models (document.Document, file.File, document.ExtractedLicense, etc) should be stored as python and spdx types so that they are independent of the format to be written or read.
I had problems trying to write YAML files from a document instance created by reading an RDF file because pyyaml was not able to represent some rdflib objects (and it does not have to be able to). As expected, pyyaml is able to deal with python objects.
Of course, almost every spdx type has overwritten the __str__ and __repr__ methods and some others have methods that return the 'str' representation of their single fields, but they sometimes assume (correctly) that fields such as full_name, text or comment (from document.ExtractedLicense) are strings (but they are rdflib.terms.Literal objects when parsed from RDF files), so those functions just return the field (rdflib type).
Again, spdx types should be independent of the format to be written or read.
Steps to reproduce:
SPDXVersion: SPDX-1.2
to SPDXVersio: SPDX-1.2
, the tool should report an error like Found unknown tag : SPDXVersio at line: 2
but instead it continues to parse.The tokenizing works fine, the problem is with the parsing method.
While running example codes in how to use section of https://github.com/spdx/tools-python,
Running
python parse_tv.py '../data/SPDXSimpleTag.tag'
gives following error:
Generating LALR tables
SPDXVersion must be SPDX-1.2 found SPDX-2.1.
Errors encountered while parsing
On the validation side, the validation will be an option. Fields will not be mandatory by default. They will be checked only when the user asks to validate and not at the time of creation. This will enforce the user to be able to create and dump eventually partial documents that may not yet be fully valid.
This can be established by introducing a validate
flag in both the parsers (tagvalue and rdf) and validating only when the flag is set.
@pombredanne Should I create a PR for this one ?
The build fails for this project. The build fails for Python 3.4 and 3.5.I am working on to correct this.Any feedback would be appreciated.
See the discussion at nexB/scancode-toolkit#436 (comment).
At worst I would prefer a py.test dep, but no dep is better.
Several changes to the JSON and YAML formats were discussed and generally agreed on for the SPDX 2.2 spec.
There is a PR with changes to the example file: spdx/spdx-spec#149. The PR documents the related issues which were resolved (sorry for the extra clicks to find get all the documentation).
The Python libraries may need to be updated to match the 2.2 spec.
Package managers or text files may declare a license string that may not be an SPDX license format. For example, many projects declare their license as BSD
but it is unclear which BSD
.
oss-review-toolkit has an implementation that comes reasonably close.
This is written in kotlin but it looks reasonably straightforward to convert it to python2/3
This would be useful for tools that are reading user-declared licenses.
There are still some codes called the print function the Python 2-way without the parentheses, which may raise SyntaxError in Python 3
here are some examples
except InvalidDocumentError:
print 'Document is Invalid'
messages = []
doc.validate(messages)
print '\n'.join(messages)
The spdx
name is taken by @bbqsrc which is OK, we can pick spdx-tools
instead.
But to avoid namespace issues, either we should find a way to share the namespace (see bbqsrc/spdx-python#1 ) or refactor the namespace used here.
File: tagvaluebuilders.py
The first sentence i.e. Sets the package's license comment.
should be corrected to Sets the package's copyright text.
https://github.com/nexB/license-expression provides a comprehensive support for parsing, comparing, validating, normalizing and resolving license expressions based on SPDX or any other license identifiers or names. It uses a boolean logic engine behind the scenes to handle this correctly
We should add this to support SPDX license expressions.
We have a FIXME which states that we should use isinstance (# FIXME: use isinstance instead??
) instead of this statement -> if (type(self.conc_lics) in [utils.NoAssert, utils.SPDXNone]
.
The file write_tv.py in examples is not able to write to the sample.tag file. Errors are displaying showing that the document is invalid and spdx_id undefined.
Fix typo in document.py
Description- I am using this spdx as deps in scancode-toolkit .When i running the api calc_verif_code
is not working properly on Python 3. Its saying TypeError: Unicode-objects must be encoded before hashing
My source code - https://github.com/nexB/scancode-toolkit/blob/develop/src/formattedcode/output_spdx.py#L319
OS- macOS Mojave(10.14.6)
Python version - 3.6.8
Even your latest version that supports py3 is also not available on Pypi.
Please update setup also https://github.com/spdx/tools-python/blob/master/setup.py#L51
Please fix this API(https://github.com/spdx/tools-python/blob/master/spdx/package.py#L246) ASAP , i need Py 2/3 compatible .Thanks
An authors file is needed to list and thank people who have contributed to the project.
Is it possible to get a license in Json or XML format form SPDX database by searching through the license names using a string (or a regular expression)? The result can also be a best effort result if the given search string is not exact.
The files where the word redundent
has been used are :
spdx/parsers/rdf.py
and spdx/parsers/tagvalue.py
when I run example/pp_rdf.py following the README
pp_rdf.py ../data/SPDXRdfExample.rdf pretty.rdf
it will raise the Error
TypeError: write() argument must be str, not bytes
RDF parsers do not handle 'projectURI' field of 'artifactOf' section.
tools-python/spdx/parsers/rdf.py
Lines 588 to 596 in 301d72f
RDF writers do not handle anything in 'artifactOf' section. That section is not being written in RDF files. A way to try it is running python tv_to_rdf.py ../data/SPDXTagExample.tag result.rdf
Output file attached:
result.rdf.txt
has_optional_field(field) verifies whether the value of field is not None. But, some attributes are sometimes initialized as some default/empty object, like an empty list. In that example, has_optional_field(field) will always return True, even if the list is empty (when I think it should return False because there is actually no information).
I suggest: return bool(field)
See https://github.com/spdx/tools-python/blob/master/spdx/parsers/tagvalue.py#L908-L909
(One statement needs to be removed)
@sschuberth @pombredanne
According to 2ecb365, we do not require the version to be 1.2, but we have a bug. (still require the version to be 1.2)
Steps to reproduce:
SPDXSimpleTag.tag
and replace this line -> SPDXVersion: SPDX-1.2
with SPDXVersion: SPDX-1.1
.python parse_tv.py '../data/SPDXSimpleTag.tag'
from the examples
directory.Output that we receive:
examples git:(master) โ python parse_tv.py '../data/SPDXSimpleTag.tag'
Generating LALR tables
SPDXVersion must be SPDX-1.2 found SPDX-1.1.
Errors encountered while parsing
The cause of this behaviour is the given snippet inside tagvaluebuilders.py
:
if vers == version.Version(major=1, minor=2):
doc.version = vers
return True
When parsing the sample.tag
file using python parse_tv.py sample.tag
, it throws an error TypeError: expected string or buffer
. The sample.tag
is generated after running python write_tv.py sample.tag
. IMHO this happens because when write_tv.py
is executed it creates a key value pair PackageCopyrightText: NOASSERTION
, here type of NOASSERTION
is <class 'spdx.utils.NoAssert'>
which is not supported.
The field Document.extracted_licenses contains duplicate ExtractedLicense objects when they are parsed from RDF files.
It can be noticed by running parse_rdf.py.
Input: SPDXRdfExample.rdf
Output:
doc comment: This is a sample spreadsheet
Creators:
Person: Gary O'Neall
Tool: SourceAuditor-V1.2
Organization: Source Auditor Inc.
Document review information:
Reviewer: Person: Suzanne Reviewer
Date: 2011-03-13 00:00:00
Comment: Another example reviewer.
Reviewer: Person: Joe Reviewer
Date: 2010-02-10 00:00:00
Comment: This is just an example. Some of the non-standard licenses look like they are actually BSD 3 clause licenses
Creation comment: This is an example of an SPDX spreadsheet format
Package Name: SPDX Translator
Package Version: Version 0.9.2
Package Download Location: http://www.spdx.org/tools
Package Homepage: None
Package Checksum: 2fd4e1c67a2d28fced849ee1bb76e7391b93eb12
Package verification code: 4e3211c67a2d28fced849ee1bb76e7391b93feba
Package excluded from verif: SpdxTranslatorSpdx.txt,SpdxTranslatorSpdx.rdf
Package license concluded: LicenseRef-4 AND LicenseRef-2 AND Apache-1.0 AND LicenseRef-3 AND LicenseRef-1 AND Apache-2.0 AND MPL-1.1
Package license declared: MPL-1.1 AND Apache-2.0 AND LicenseRef-3 AND LicenseRef-2 AND LicenseRef-4 AND LicenseRef-1
Package licenses from files:
LicenseRef-1
LicenseRef-3
Apache-1.0
MPL-1.1
LicenseRef-4
LicenseRef-2
Apache-2.0
Package Copyright text: Copyright 2010, 2011 Source Auditor Inc.
Package summary: SPDX Translator utility
Package description: This utility translates and SPDX RDF XML document to a spreadsheet, translates a spreadsheet to an SPDX RDF XML document and translates an SPDX RDFa document to an SPDX RDF XML document.
Package Files:
File name: Jenna-2.6.3/jena-2.6.3-sources.jar
File type: ARCHIVE
File Checksum: 3ab4e1c67a2d28fced849ee1bb76e7391b93f125
File license concluded: LicenseRef-1
File license info in file: LicenseRef-1
File artifact of project name: Jena
File name: src/org/spdx/parser/DOAPProject.java
File type: SOURCE
File Checksum: 2fd4e1c67a2d28fced849ee1bb76e7391b93eb12
File license concluded: Apache-2.0
File license info in file: Apache-2.0
File artifact of project name:
Document Extracted licenses:
Identifier: LicenseRef-4
Name: None
Identifier: LicenseRef-2
Name: None
Identifier: LicenseRef-3
Name: CyberNeko License
Identifier: LicenseRef-1
Name: None
Identifier: LicenseRef-3
Name: CyberNeko License
Identifier: LicenseRef-2
Name: None
Identifier: LicenseRef-4
Name: None
Identifier: LicenseRef-1
Name: None
Annotations:
Annotator: Person: Jim Reviewer
Annotation Date: 2012-06-13 00:00:00
Annotation Comment: This is just an example. Some of the non-standard licenses look like they are actually BSD 3 clause licenses
Annotation Type: REVIEW
Annotation SPDX Identifier: https://spdx.org/spdxdocs/spdx-example-444504E0-4F89-41D3-9A0C-0305E82C3301#SPDXRef-45
The error that I get is :
bash: line 2: virtualenv: command not found
((if (directory? "/Users/distiller/virtualenvs/venv-system") (echo "Using cached venv") (do ("virtualenv" "/Users/distiller/virtualenvs/venv-system" || exit 1) (source "/Users/distiller/virtualenvs/venv-system/bin/activate" || exit 2) (pip install nose || exit 3))) "true" (echo "source /Users/distiller/virtualenvs/venv-system/bin/activate" >> "~/.circlerc")) returned exit code 1
Action failed: virtualenv
RDF and TAG/VALUE file formats have two different ways to write SPDX IDs throughout a SPDX Document (document ID, file ID, annotation ID, package ID, etc).
For RDF files: [DocumentNamespace|DocumentURI]#[SPDX Identifier]
For TAG/VALUE files: [SPDX Identifier]
A problem comes up when a TAG/VALUE file is written from reading an RDF file: SPDX IDs will have the format for RDF files. Moreover, if this behavior is ignored while developing JSON/YAML/XML support, it will spread.
While going through the codebase, I've noticed a few PEP8 violations which can be fixed by a static code analysis tool like coala.
Instead the tests should only depend on the plain unittest
so that anyone can use any test runner they like (and I love py.test for this)
Attributes to be added are:
See #71.
Update the SPDX Python libraries to the SPDX 2.1 specification. The SPDX 2.1 specification is a major upgrade from SPDX 1.2 supporting relationships between SPDX documents and SPDX elements.
See https://github.com/nexB/license-expression/ or https://github.com/nexB/scancode-toolkit/ for an example
This would be a merge to combine two SPDX documents without losing any of the license information. An example use case: A SPDX document including human input is already created, but a new document is automatically generated with a scanner and we wish to merge the two without losing the modifications made by a human.
The .gitignore file does not have entries for the virtualenv and IDEs file.
The current implementation only supports parsing a single package in the SPDX document. There can be one or more packages as mentioned here. We should add support for parsing more than one package.
I would like to implement this functionality as a small project for understanding my targetted GSoC project a little better. I am planning of making comparision between two rdf format documents at first, then I will go on to expand for tv format as well.Let me know your thoughts on this @pombredanne , so that I can proceed.
BACKGROUND INFORMATION
major
and minor
fields of Version
model are supposed to be integers and the class method Version.from_str()
is responsible for assuring that.
Lines 36 to 45 in 301d72f
license_list_version
field is absent when initializing a CreationInfo
object, a default value is taken from the config.py
file.tools-python/spdx/creationinfo.py
Lines 131 to 136 in 301d72f
Version
object described above is created assigning major
and minor
fields directly as string values.Lines 48 to 49 in 301d72f
__lt__
method and parsers tests (expect integer values) may fail.A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.