Code Monkey home page Code Monkey logo

Comments (7)

goneall avatar goneall commented on May 25, 2024 1

Should I focus on the tag/value builders and parsers for the upgrade to spec 2.1 and keep the rdf part under the optional section ?

I would prefer to maintain support for the RDF. I know there are several users of RDF for the Java libraries (since they are finding bugs šŸ˜‰ I assume there will also be interested users for Python as well. Iā€™m OK with the primary focus on the tag/value since that is the larger community. I can help provide support for any RDF related questions.

Also, since our primary focus is on the tagvalue, we might face problems when while conversions in each of the formats (and also while executing validation functions in individual classes)

There are some discussions of supporting other formats like JSON, YAML, and JSON-LD. Having a design which allows for conversion between various formats would be very helpful for forward compatibility.

We have separate files for builders and parsers for the rdf and tv formats. Can I have a brief idea of how do we want the libraries to be refactored for independence from each other ?

Is there a design where we have a core model which is (semi) independent of the format and converts to build and parse different formats? I have to admit I have not dived deeply into the current Python implementation, but the basic idea of having a model independent of the serialization format seems like it should work. Where things got messy in the Java code is the validation (as you point out below) and maintaining the information on where in the input file the validation errors occur.

IIUC, this line -> if (not self.error) and (not self.document.validate(validation_messages)): in both the parsers (tv and rdf) will call the validate(...) function as defined in class Document(...). Now, since our primary focus is to take care of the tag/value files, there might be some fields in tag/value which are mandatory and we haven't included them in our rdf model. This will trigger the validate(..) function to return False when when it is called via the rdf parser.
So, this is where we're facing problems since we have a common validate API tied to both the tag/value and rdf models.
We could maybe pass additional fields like: validate(..., rdf=False, tv=True) and then handle things accordingly ? This would be a not so elegant solution, but will somewhat clutter our codebase with if...else statements.

I'll provide an opinion or two on this with the caveat that I have not spent a lot of time in the Python code, so the replies below may not match the current design/structure.

For the Java libraries, we tried to separate out the validation into 2 separate categories - parsing errors and specification related errors. The tag/value specific requirements could be thought of as parsing errors since they are typically related to positional requirements to allow unambiguous parsing. The specification related errors should be common for both RDF and Tag/Value. For the positional requirements in tag/value we just threw exceptions rather than handle them in the validation since the resultant model could not be reliably produced.

If we want the validation to incorporate format specific errors, could we record those errors during parsing and store them in the model rather than passing a flag to the validation method? We know when we are parsing which format is being used.

One other consideration in the validation is maintaining the location of the error in the input format. This proved to be a challenge for the Java code design. The line number and character positions need to be captured during the parsing and reported back during validation. For the tag/value we only provided this for the parser errors (not the spec related errors).

from tools-python.

yash-nisar avatar yash-nisar commented on May 25, 2024

From @pombredanne :
We should drop support for 1.2 alright and yes focus on new 2.x things.
As far as I am concerned, I would be quite happy with a complete refactoring of the lib internals such that:

  1. tag/value is the primary target
  2. the internal model is NOT based on RDF data structures but for instance models based on attrs or cattrs, e.g. something pythonic and not tied to RDF.
  3. RDF would only be an adapter of sorts feeding in that model and an "after thought" rather than being central to the data model.

e.g. RDF/XML is not something I care too much for. more like a nice to have for me.

If a refactoring could be done to cut the internal model ties from RDF that would be a big win (and a big work too).

from tools-python.

yash-nisar avatar yash-nisar commented on May 25, 2024

@pombredanne @sschuberth @rtgdk It would be great if I could have your inputs on a few things :

Should I focus on the tag/value builders and parsers for the upgrade to spec 2.1 and keep the rdf part under the optional section ?

Also, since our primary focus is on the tagvalue, we might face problems when while conversions in each of the formats (and also while executing validation functions in individual classes)

I went through the docs of attrs, which will help us to get rid of the boilerplate code in classes that we have to define all the time.

We have separate files for builders and parsers for the rdf and tv formats. Can I have a brief idea of how do we want the libraries to be refactored for independence from each other ?

IIUC, this line -> if (not self.error) and (not self.document.validate(validation_messages)): in both the parsers (tv and rdf) will call the validate(...) function as defined in class Document(...). Now, since our primary focus is to take care of the tag/value files, there might be some fields in tag/value which are mandatory and we haven't included them in our rdf model. This will trigger the validate(..) function to return False when when it is called via the rdf parser.
So, this is where we're facing problems since we have a common validate API tied to both the tag/value and rdf models.
We could maybe pass additional fields like: validate(..., rdf=False, tv=True) and then handle things accordingly ? This would be a not so elegant solution, but will somewhat clutter our codebase with if...else statements.

from tools-python.

sschuberth avatar sschuberth commented on May 25, 2024

Hi @yash-nisar. I'm sorry, but I'm going to pull myself out of this project as I personally have to need for it anymore, and I'm not a real Python guy anyway. As a result, I will probably not respond to issues or pull requests.

from tools-python.

yash-nisar avatar yash-nisar commented on May 25, 2024

from tools-python.

meretp avatar meretp commented on May 25, 2024

Hi @yash-nisar! We are currently cleaning up this repo and I came across your issue that is still asking for support for spec version 2.1. Since this issue is already a bit older and many changes have been made in the repo: Are you still missing fields for spec 2.1?

Concerning the discussion above:
Beginning with #244 we are planning to refactor the data model and the builder structure. This will enable us to tackle the refactoring and isolation of the individual formats mentioned here.
In the course of the refactoring, the validation will also have to be checked. I like the approach described by @goneall for the java tools to distinguish between parser-related validation (which is probably less of a validation and more of a basic check if the content is parsable) and spec-related validation. With #212 we have an open issue concerning the validation.

In your view @yash-nisar, are there any outstanding issues from this issue or could it be closed?

from tools-python.

nicoweidner avatar nicoweidner commented on May 25, 2024

Closing this for now, please ping if it should be reopened

from tools-python.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    šŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. šŸ“ŠšŸ“ˆšŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ā¤ļø Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.