So the only return result is a Hash? That's very limited. A Hash can't encode all the

<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clip

I just bumped into the problem <a class="user-mention notranslate" data-hovercard-type

Lossy Conversion from XML to Hash about multi_xml HOT 11 CLOSED

sferik commented on June 29, 2024

Lossy Conversion from XML to Hash

from multi_xml.

Comments (11)

sferik commented on June 29, 2024

Why specifically is a Hash an insufficient representation of an XML document? It seems to be to be sufficient for many use cases. Maybe you could offer an example where a Hash is a poor choice.

PS: What ever happened to the Cherry source code? I'd be interested to take a look at it.

from multi_xml.

trans commented on June 29, 2024

How do you represent processing instructions , and though perhaps not as important comments get lost. And there is the whole question of how to differentiate body text from attributes. Hashes work well in general. But it's kind of a one way street, and to have fully consistent results it can be rather verbose. Looking over multi-xml's specs, I'm not sure one would be certain what to expect. For instance:

<user name="tom"/>

and

<user><name>tom</name></user>

Look like they would return the same result. So how would one know the difference?

from multi_xml.

sferik commented on June 29, 2024

Technically, comments and processing instructions are not part of the XML document proper, they're meta-data, which I believe is acceptable to drop during parsing. That said, if processing instructions are critical to your application, I would accept a patch to wrap the response in an envelope that includes processing instructions as headers. However, I don't see why this would necessitate returning something other than a Hash. For example:

doc = MultiXML.parse(xml, :symbolize_keys => true)
doc[:headers][:processing_instructions] # e.g. {:xml_stylesheet => "href=\"/style.xsl\" type=\"text/xsl\""}
doc[:body] # the parsed document

While I'm aware that processing instructions can appear anywhere in the document, all of the uses I've seen appear in the prolog, before the root node.

To your second point, I would argue that the distinction between attributes and child nodes is syntactic or stylistic, not semantic. In my mind, both of your examples parse as "a user whose name is tom." Do you have a different interpretation?

The easiest way to persuade me is by pointing to a real-world example of a document or API that uses the same attribute and child node name to describe two distinct properties. For example:

<user name="tom"><name>tom</name></user>

where the value of the attribute "name" means something different from contents of the node "name". In such a case, it would make sense for the values to be different:

<user name="tom"><name>bob</name></user>

but I would argue that this document, while perfectly valid XML, is nonsensical and would never appear in the wild (but I'd be happy to be proven wrong—there are a lot of crazy documents in the world).

I hope you understand, I'm trying to be pragmatic, solving for the most common use cases first. I'm happy to solve for edge cases when they arise, but, in my experience, edge cases often turn out to be theoretical barriers to progress, as opposed to actual constraints.

from multi_xml.

paulwalker commented on June 29, 2024

  <xml>
  <param name="name_a">foo</param>
  <param name="name_b">bar</param>
  </xml>

The name attribute values aren't present in the hash at all.

from multi_xml.

paulwalker commented on June 29, 2024

Attributes and their values are not optional in xml parsing.

from multi_xml.

juggy commented on June 29, 2024

I just bumped into the problem @paulwalker mentionned here. When you have a node with an attribute but without inner node (ie content only) the attribute is not present in the hash.

This is causing some problem right now. Do you have a workaround?

from multi_xml.

sferik commented on June 29, 2024

I agree this is a bug and don't have a simple workaround. Would you be able to write a failing spec? That seems like a good start.

from multi_xml.

juggy commented on June 29, 2024

I've got a fix I created for a project, but it break the current way of handling attributes. Instead of adding the attributes to the node itself, I changed it so it is added to the parent node in the following format: node_name@attr_name

It also solve the problem that first created this ticket: know if a node is an attribute or not.

It is a one line fix within the lib2xmlparser. It could be implemented as a different parsing mode, so multi_xml would stay compatible. I'll commit it and send it over.

EDIT: you can have a look here juggy@1d4fd5d#L0L41

from multi_xml.

ginjo commented on June 29, 2024

Running into same problem here. Real-world XML pulled from a Filemaker database. The generated hash drops the DISPLAY attribute entirely.

<VALUELIST NAME="Employee Unique ID">
  <VALUE DISPLAY="281 Abel">281</VALUE>
  <VALUE DISPLAY="254 Adam">254</VALUE>
  <VALUE DISPLAY="182 Adriane">182</VALUE>
  <VALUE DISPLAY="213 Alma">213</VALUE>
  <VALUE DISPLAY="183 Amanda">183</VALUE>
</VALUELIST>

from multi_xml.

sferik commented on June 29, 2024

Could you please write a failing spec for this case?

from multi_xml.

rubiii commented on June 29, 2024

hey guys,

everyone could write a failing spec for this, but i don't think that's the problem here. as @trans pointed out, the current
hash structure can probably represent most xml data, but it fails in more complex situations. you identified two cases
in which multi_xml fails to return a proper result and i know that similar projects (crack for example) have the same problems.

i came across a project that seems to solve those problems by introducing a little more structure to the hash years ago.
it's called cobra vs mongoose (brilliant name!) and here's an example:

xml = '<alice id="1"><bob id="2">charlie</bob><bob id="3">david</bob></alice>'
CobraVsMongoose.xml_to_hash(xml)
# => { "alice" => { "@id" => "1", "bob" => [ { "@id" => "2", "$" => "charlie" }, { "@id" => "3", "$" => "david" } ] } }

a structure like this is certainly less convenient, but i just wanted to point out that these problems can be solved.

cheers,
daniel

from multi_xml.

Lossy Conversion from XML to Hash about multi_xml HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent