Update the SPDX Python libraries to the SPDX 2.1 specification. The SPDX 2.1 specifica

From <a class="user-mention notranslate" data-hovercard-type="user" data-

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

No problem :). Thank you for your reviews. <span class="email-hidd

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Update to full SPDX v2.1 about tools-python HOT 7 CLOSED

spdx commented on May 25, 2024

Update to full SPDX v2.1

from tools-python.

Comments (7)

goneall commented on May 25, 2024 1

Should I focus on the tag/value builders and parsers for the upgrade to spec 2.1 and keep the rdf part under the optional section ?

I would prefer to maintain support for the RDF. I know there are several users of RDF for the Java libraries (since they are finding bugs 😉 I assume there will also be interested users for Python as well. I’m OK with the primary focus on the tag/value since that is the larger community. I can help provide support for any RDF related questions.

Also, since our primary focus is on the tagvalue, we might face problems when while conversions in each of the formats (and also while executing validation functions in individual classes)

There are some discussions of supporting other formats like JSON, YAML, and JSON-LD. Having a design which allows for conversion between various formats would be very helpful for forward compatibility.

We have separate files for builders and parsers for the rdf and tv formats. Can I have a brief idea of how do we want the libraries to be refactored for independence from each other ?

Is there a design where we have a core model which is (semi) independent of the format and converts to build and parse different formats? I have to admit I have not dived deeply into the current Python implementation, but the basic idea of having a model independent of the serialization format seems like it should work. Where things got messy in the Java code is the validation (as you point out below) and maintaining the information on where in the input file the validation errors occur.

IIUC, this line -> if (not self.error) and (not self.document.validate(validation_messages)): in both the parsers (tv and rdf) will call the validate(...) function as defined in class Document(...). Now, since our primary focus is to take care of the tag/value files, there might be some fields in tag/value which are mandatory and we haven't included them in our rdf model. This will trigger the validate(..) function to return False when when it is called via the rdf parser.
So, this is where we're facing problems since we have a common validate API tied to both the tag/value and rdf models.
We could maybe pass additional fields like: validate(..., rdf=False, tv=True) and then handle things accordingly ? This would be a not so elegant solution, but will somewhat clutter our codebase with if...else statements.

I'll provide an opinion or two on this with the caveat that I have not spent a lot of time in the Python code, so the replies below may not match the current design/structure.

For the Java libraries, we tried to separate out the validation into 2 separate categories - parsing errors and specification related errors. The tag/value specific requirements could be thought of as parsing errors since they are typically related to positional requirements to allow unambiguous parsing. The specification related errors should be common for both RDF and Tag/Value. For the positional requirements in tag/value we just threw exceptions rather than handle them in the validation since the resultant model could not be reliably produced.

If we want the validation to incorporate format specific errors, could we record those errors during parsing and store them in the model rather than passing a flag to the validation method? We know when we are parsing which format is being used.

One other consideration in the validation is maintaining the location of the error in the input format. This proved to be a challenge for the Java code design. The line number and character positions need to be captured during the parsing and reported back during validation. For the tag/value we only provided this for the parser errors (not the spec related errors).

from tools-python.

yash-nisar commented on May 25, 2024

From @pombredanne :
We should drop support for 1.2 alright and yes focus on new 2.x things.
As far as I am concerned, I would be quite happy with a complete refactoring of the lib internals such that:

tag/value is the primary target

the internal model is NOT based on RDF data structures but for instance models based on attrs or cattrs, e.g. something pythonic and not tied to RDF.

RDF would only be an adapter of sorts feeding in that model and an "after thought" rather than being central to the data model.

e.g. RDF/XML is not something I care too much for. more like a nice to have for me.

If a refactoring could be done to cut the internal model ties from RDF that would be a big win (and a big work too).

from tools-python.

yash-nisar commented on May 25, 2024

@pombredanne @sschuberth @rtgdk It would be great if I could have your inputs on a few things :

Should I focus on the tag/value builders and parsers for the upgrade to spec 2.1 and keep the rdf part under the optional section ?

Also, since our primary focus is on the tagvalue, we might face problems when while conversions in each of the formats (and also while executing validation functions in individual classes)

I went through the docs of attrs, which will help us to get rid of the boilerplate code in classes that we have to define all the time.

We have separate files for builders and parsers for the rdf and tv formats. Can I have a brief idea of how do we want the libraries to be refactored for independence from each other ?

IIUC, this line -> if (not self.error) and (not self.document.validate(validation_messages)): in both the parsers (tv and rdf) will call the validate(...) function as defined in class Document(...). Now, since our primary focus is to take care of the tag/value files, there might be some fields in tag/value which are mandatory and we haven't included them in our rdf model. This will trigger the validate(..) function to return False when when it is called via the rdf parser.
So, this is where we're facing problems since we have a common validate API tied to both the tag/value and rdf models.
We could maybe pass additional fields like: validate(..., rdf=False, tv=True) and then handle things accordingly ? This would be a not so elegant solution, but will somewhat clutter our codebase with if...else statements.

from tools-python.

sschuberth commented on May 25, 2024

Hi @yash-nisar. I'm sorry, but I'm going to pull myself out of this project as I personally have to need for it anymore, and I'm not a real Python guy anyway. As a result, I will probably not respond to issues or pull requests.

from tools-python.

yash-nisar commented on May 25, 2024

No problem :). Thank you for your reviews.

…

On Mon, Mar 19, 2018 at 6:42 PM, Sebastian Schuberth < ***@***.***> wrote: Hi @yash-nisar <https://github.com/yash-nisar>. I'm sorry, but I'm going to pull myself out of this project as I personally have to need for it anymore, and I'm not a real Python guy anyway. As a result, I will probably not respond to issues or pull requests. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#47 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AOLKgBG3_FFvJpCSTqQ6bqIOkjHHIne7ks5tf67XgaJpZM4SqEXV> .

-- <http://www.somaiya.edu> <http://www.somaiya.edu/kjsmc> <http://www.nareshwadi.org> <http://www.somaiya.edu> <http://www.helpachild.in> <http://nareshwadi.org>

from tools-python.

meretp commented on May 25, 2024

Hi @yash-nisar! We are currently cleaning up this repo and I came across your issue that is still asking for support for spec version 2.1. Since this issue is already a bit older and many changes have been made in the repo: Are you still missing fields for spec 2.1?

Concerning the discussion above:
Beginning with #244 we are planning to refactor the data model and the builder structure. This will enable us to tackle the refactoring and isolation of the individual formats mentioned here.
In the course of the refactoring, the validation will also have to be checked. I like the approach described by @goneall for the java tools to distinguish between parser-related validation (which is probably less of a validation and more of a basic check if the content is parsable) and spec-related validation. With #212 we have an open issue concerning the validation.

In your view @yash-nisar, are there any outstanding issues from this issue or could it be closed?

from tools-python.

nicoweidner commented on May 25, 2024

Closing this for now, please ping if it should be reopened

from tools-python.

Update to full SPDX v2.1 about tools-python HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent