Comments (7)
Should I focus on the tag/value builders and parsers for the upgrade to spec 2.1 and keep the rdf part under the optional section ?
I would prefer to maintain support for the RDF. I know there are several users of RDF for the Java libraries (since they are finding bugs š I assume there will also be interested users for Python as well. Iām OK with the primary focus on the tag/value since that is the larger community. I can help provide support for any RDF related questions.
Also, since our primary focus is on the tagvalue, we might face problems when while conversions in each of the formats (and also while executing validation functions in individual classes)
There are some discussions of supporting other formats like JSON, YAML, and JSON-LD. Having a design which allows for conversion between various formats would be very helpful for forward compatibility.
We have separate files for builders and parsers for the rdf and tv formats. Can I have a brief idea of how do we want the libraries to be refactored for independence from each other ?
Is there a design where we have a core model which is (semi) independent of the format and converts to build and parse different formats? I have to admit I have not dived deeply into the current Python implementation, but the basic idea of having a model independent of the serialization format seems like it should work. Where things got messy in the Java code is the validation (as you point out below) and maintaining the information on where in the input file the validation errors occur.
IIUC, this line -> if (not self.error) and (not self.document.validate(validation_messages)): in both the parsers (tv and rdf) will call the validate(...) function as defined in class Document(...). Now, since our primary focus is to take care of the tag/value files, there might be some fields in tag/value which are mandatory and we haven't included them in our rdf model. This will trigger the validate(..) function to return False when when it is called via the rdf parser.
So, this is where we're facing problems since we have a common validate API tied to both the tag/value and rdf models.
We could maybe pass additional fields like: validate(..., rdf=False, tv=True) and then handle things accordingly ? This would be a not so elegant solution, but will somewhat clutter our codebase with if...else statements.
I'll provide an opinion or two on this with the caveat that I have not spent a lot of time in the Python code, so the replies below may not match the current design/structure.
For the Java libraries, we tried to separate out the validation into 2 separate categories - parsing errors and specification related errors. The tag/value specific requirements could be thought of as parsing errors since they are typically related to positional requirements to allow unambiguous parsing. The specification related errors should be common for both RDF and Tag/Value. For the positional requirements in tag/value we just threw exceptions rather than handle them in the validation since the resultant model could not be reliably produced.
If we want the validation to incorporate format specific errors, could we record those errors during parsing and store them in the model rather than passing a flag to the validation method? We know when we are parsing which format is being used.
One other consideration in the validation is maintaining the location of the error in the input format. This proved to be a challenge for the Java code design. The line number and character positions need to be captured during the parsing and reported back during validation. For the tag/value we only provided this for the parser errors (not the spec related errors).
from tools-python.
From @pombredanne :
We should drop support for 1.2 alright and yes focus on new 2.x things.
As far as I am concerned, I would be quite happy with a complete refactoring of the lib internals such that:
- tag/value is the primary target
- the internal model is NOT based on RDF data structures but for instance models based on
attrs
orcattrs
, e.g. something pythonic and not tied to RDF.- RDF would only be an adapter of sorts feeding in that model and an "after thought" rather than being central to the data model.
e.g. RDF/XML is not something I care too much for. more like a nice to have for me.
If a refactoring could be done to cut the internal model ties from RDF that would be a big win (and a big work too).
from tools-python.
@pombredanne @sschuberth @rtgdk It would be great if I could have your inputs on a few things :
Should I focus on the tag/value builders and parsers for the upgrade to spec 2.1 and keep the rdf part under the optional section ?
Also, since our primary focus is on the tagvalue, we might face problems when while conversions in each of the formats (and also while executing validation functions in individual classes)
I went through the docs of attrs
, which will help us to get rid of the boilerplate code in classes that we have to define all the time.
We have separate files for builders and parsers for the rdf and tv formats. Can I have a brief idea of how do we want the libraries to be refactored for independence from each other ?
IIUC, this line -> if (not self.error) and (not self.document.validate(validation_messages)):
in both the parsers (tv and rdf) will call the validate(...)
function as defined in class Document(...)
. Now, since our primary focus is to take care of the tag/value files, there might be some fields in tag/value which are mandatory and we haven't included them in our rdf model. This will trigger the validate(..)
function to return False
when when it is called via the rdf parser.
So, this is where we're facing problems since we have a common validate API tied to both the tag/value and rdf models.
We could maybe pass additional fields like: validate(..., rdf=False, tv=True)
and then handle things accordingly ? This would be a not so elegant solution, but will somewhat clutter our codebase with if...else
statements.
from tools-python.
Hi @yash-nisar. I'm sorry, but I'm going to pull myself out of this project as I personally have to need for it anymore, and I'm not a real Python guy anyway. As a result, I will probably not respond to issues or pull requests.
from tools-python.
from tools-python.
Hi @yash-nisar! We are currently cleaning up this repo and I came across your issue that is still asking for support for spec version 2.1. Since this issue is already a bit older and many changes have been made in the repo: Are you still missing fields for spec 2.1?
Concerning the discussion above:
Beginning with #244 we are planning to refactor the data model and the builder structure. This will enable us to tackle the refactoring and isolation of the individual formats mentioned here.
In the course of the refactoring, the validation will also have to be checked. I like the approach described by @goneall for the java tools to distinguish between parser-related validation (which is probably less of a validation and more of a basic check if the content is parsable) and spec-related validation. With #212 we have an open issue concerning the validation.
In your view @yash-nisar, are there any outstanding issues from this issue or could it be closed?
from tools-python.
Closing this for now, please ping if it should be reopened
from tools-python.
Related Issues (20)
- Ugly error for nonexisting file HOT 1
- Valid SPDX cannot be converted from JSON to tag:value HOT 1
- Ugly error message when the JSON syntax is not correct HOT 1
- Question of generation SBOM HOT 3
- Be more lenient when parsing "true" and "false" in tag-value
- Adding NOASSERTION/NONE to DocumentRef in relationships as per the spdx v2.3 spec HOT 3
- Request for Handling Custom Licenses in Document and Package Validators HOT 6
- Converting valid JSON SPDX file to tag:value gives invalid SPDX
- would like to package - but the name is unsuitable HOT 1
- Slow for SBOMs with a large number of files + relationships HOT 1
- F
- Would like an option to omit files from graph
- `create_list_without_duplicates` Function Can be Sped Up By Using Set
- Incorrect cpe23Type validation? HOT 2
- Relationship with Package Section HOT 2
- Failed to convert spdx to xml with Annotation HOT 1
- Error while calling SPDX parse_file() API inside thread function
- Remove unused semantic_version module HOT 1
- Why use uritools instead of the standard library urllib? HOT 1
- The LicenseRef prefix should not be required for standard licenses
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
š Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ššš
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ā¤ļø Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tools-python.