Code Monkey home page Code Monkey logo

Comments (11)

sferik avatar sferik commented on June 29, 2024

Why specifically is a Hash an insufficient representation of an XML document? It seems to be to be sufficient for many use cases. Maybe you could offer an example where a Hash is a poor choice.

PS: What ever happened to the Cherry source code? I'd be interested to take a look at it.

from multi_xml.

trans avatar trans commented on June 29, 2024

How do you represent processing instructions , and though perhaps not as important comments get lost. And there is the whole question of how to differentiate body text from attributes. Hashes work well in general. But it's kind of a one way street, and to have fully consistent results it can be rather verbose. Looking over multi-xml's specs, I'm not sure one would be certain what to expect. For instance:

<user name="tom"/>

and

<user><name>tom</name></user>

Look like they would return the same result. So how would one know the difference?

from multi_xml.

sferik avatar sferik commented on June 29, 2024

Technically, comments and processing instructions are not part of the XML document proper, they're meta-data, which I believe is acceptable to drop during parsing. That said, if processing instructions are critical to your application, I would accept a patch to wrap the response in an envelope that includes processing instructions as headers. However, I don't see why this would necessitate returning something other than a Hash. For example:

doc = MultiXML.parse(xml, :symbolize_keys => true)
doc[:headers][:processing_instructions] # e.g. {:xml_stylesheet => "href=\"/style.xsl\" type=\"text/xsl\""}
doc[:body] # the parsed document

While I'm aware that processing instructions can appear anywhere in the document, all of the uses I've seen appear in the prolog, before the root node.

To your second point, I would argue that the distinction between attributes and child nodes is syntactic or stylistic, not semantic. In my mind, both of your examples parse as "a user whose name is tom." Do you have a different interpretation?

The easiest way to persuade me is by pointing to a real-world example of a document or API that uses the same attribute and child node name to describe two distinct properties. For example:

<user name="tom"><name>tom</name></user>

where the value of the attribute "name" means something different from contents of the node "name". In such a case, it would make sense for the values to be different:

<user name="tom"><name>bob</name></user>

but I would argue that this document, while perfectly valid XML, is nonsensical and would never appear in the wild (but I'd be happy to be proven wrong—there are a lot of crazy documents in the world).

I hope you understand, I'm trying to be pragmatic, solving for the most common use cases first. I'm happy to solve for edge cases when they arise, but, in my experience, edge cases often turn out to be theoretical barriers to progress, as opposed to actual constraints.

from multi_xml.

paulwalker avatar paulwalker commented on June 29, 2024
  <xml>
  <param name="name_a">foo</param>
  <param name="name_b">bar</param>
  </xml>

The name attribute values aren't present in the hash at all.

from multi_xml.

paulwalker avatar paulwalker commented on June 29, 2024

Attributes and their values are not optional in xml parsing.

from multi_xml.

juggy avatar juggy commented on June 29, 2024

I just bumped into the problem @paulwalker mentionned here. When you have a node with an attribute but without inner node (ie content only) the attribute is not present in the hash.

This is causing some problem right now. Do you have a workaround?

from multi_xml.

sferik avatar sferik commented on June 29, 2024

I agree this is a bug and don't have a simple workaround. Would you be able to write a failing spec? That seems like a good start.

from multi_xml.

juggy avatar juggy commented on June 29, 2024

I've got a fix I created for a project, but it break the current way of handling attributes. Instead of adding the attributes to the node itself, I changed it so it is added to the parent node in the following format: node_name@attr_name

It also solve the problem that first created this ticket: know if a node is an attribute or not.

It is a one line fix within the lib2xmlparser. It could be implemented as a different parsing mode, so multi_xml would stay compatible. I'll commit it and send it over.

EDIT: you can have a look here juggy@1d4fd5d#L0L41

from multi_xml.

ginjo avatar ginjo commented on June 29, 2024

Running into same problem here. Real-world XML pulled from a Filemaker database. The generated hash drops the DISPLAY attribute entirely.

<VALUELIST NAME="Employee Unique ID">
  <VALUE DISPLAY="281 Abel">281</VALUE>
  <VALUE DISPLAY="254 Adam">254</VALUE>
  <VALUE DISPLAY="182 Adriane">182</VALUE>
  <VALUE DISPLAY="213 Alma">213</VALUE>
  <VALUE DISPLAY="183 Amanda">183</VALUE>
</VALUELIST>

from multi_xml.

sferik avatar sferik commented on June 29, 2024

Could you please write a failing spec for this case?

from multi_xml.

rubiii avatar rubiii commented on June 29, 2024

hey guys,

everyone could write a failing spec for this, but i don't think that's the problem here. as @trans pointed out, the current
hash structure can probably represent most xml data, but it fails in more complex situations. you identified two cases
in which multi_xml fails to return a proper result and i know that similar projects (crack for example) have the same problems.

i came across a project that seems to solve those problems by introducing a little more structure to the hash years ago.
it's called cobra vs mongoose (brilliant name!) and here's an example:

xml = '<alice id="1"><bob id="2">charlie</bob><bob id="3">david</bob></alice>'
CobraVsMongoose.xml_to_hash(xml)
# => { "alice" => { "@id" => "1", "bob" => [ { "@id" => "2", "$" => "charlie" }, { "@id" => "3", "$" => "david" } ] } }

a structure like this is certainly less convenient, but i just wanted to point out that these problems can be solved.

cheers,
daniel

from multi_xml.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.