Comments (5)
@xavierfigueroav Please attach the full output and the input files to the issue.
from tools-python.
@goneall I've added a link to the input file now.
I had already included the full output produced by parse_rdf.py. What do you mean exactly?
from tools-python.
In the input file, there are 4 extracted licenses inside some <licenseConcluded>
tags. The same 4 extracted licenses are referenced (<member rdf:nodeID="ID"/>
) inside a <licenseDeclared>
tag.
Is that something wrong in the rdf file, something that should not happen? If yes, there was never any error in parsers. If not...
Both extracted licenses in <licenseConcluded>
and extracted licenses in <licenseDeclared>
are being added to Document.extracted_licenses by the following code (specifically, line 265):
tools-python/spdx/parsers/rdf.py
Lines 264 to 267 in db243e3
That code can be split into a separate method with some changes for it to add the extracted licenses children of <SpdxDocument>
. Maybe:
for lic, _, _ in self.graph.triples((None, None, self.spdx_namespace['ExtractedLicensingInfo'])):
self.handle_extracted_license(lic)
I am no so familiarized with the SPDX RDF representation (e.g., I don't know whether all extracted licenses are always direct children of <SpdxDocument>
or not), so this may be working on this and some files, but it may not be a general solution. If you think it is OK, I can fix it that way and submit a PR.
from tools-python.
I took a look at the RDF file and it looks valid.
The 4 extracted licenses are only defined once in the RDF and the same license is referenced in other parts of the document (the ID is used to reference the same license definition).
@xavierfigueroav It looks like you found the source of the problem above - it should not add the extracted license info on each reference, it should only add it once.
Your proposed solution is very similar to the approach we take in the Java tools - adding the extracted license information as a separate method.
Just a bit more context if interested (probably not needed to fix this specific bug) - for the RDF, there are 3 ways of referencing an object - a literal value (like a string or number), a URI to point to a predefined object and Anonymous Nodes which are references generated and are local to the RDF graph. For extractedLicenseingInfo definitions Anonymous Nodes are generated and referenced throughout the RDF graph. A good overview can be found at https://www.w3.org/TR/rdf-concepts/
from tools-python.
@goneall Thank you for the link, it was and will be helpful.
from tools-python.
Related Issues (20)
- Exception not catched with LicenseRef- containing slash HOT 2
- Ugly error for nonexisting file HOT 1
- Valid SPDX cannot be converted from JSON to tag:value HOT 1
- Ugly error message when the JSON syntax is not correct HOT 1
- Question of generation SBOM HOT 3
- Be more lenient when parsing "true" and "false" in tag-value
- Adding NOASSERTION/NONE to DocumentRef in relationships as per the spdx v2.3 spec HOT 3
- Request for Handling Custom Licenses in Document and Package Validators HOT 6
- Converting valid JSON SPDX file to tag:value gives invalid SPDX
- would like to package - but the name is unsuitable HOT 1
- Slow for SBOMs with a large number of files + relationships HOT 1
- F
- Would like an option to omit files from graph
- `create_list_without_duplicates` Function Can be Sped Up By Using Set
- Incorrect cpe23Type validation? HOT 2
- Relationship with Package Section HOT 2
- Failed to convert spdx to xml with Annotation HOT 1
- Error while calling SPDX parse_file() API inside thread function
- Remove unused semantic_version module HOT 1
- Why use uritools instead of the standard library urllib? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tools-python.