Comments (11)
Why specifically is a Hash an insufficient representation of an XML document? It seems to be to be sufficient for many use cases. Maybe you could offer an example where a Hash is a poor choice.
PS: What ever happened to the Cherry source code? I'd be interested to take a look at it.
from multi_xml.
How do you represent processing instructions , and though perhaps not as important comments get lost. And there is the whole question of how to differentiate body text from attributes. Hashes work well in general. But it's kind of a one way street, and to have fully consistent results it can be rather verbose. Looking over multi-xml's specs, I'm not sure one would be certain what to expect. For instance:
<user name="tom"/>
and
<user><name>tom</name></user>
Look like they would return the same result. So how would one know the difference?
from multi_xml.
Technically, comments and processing instructions are not part of the XML document proper, they're meta-data, which I believe is acceptable to drop during parsing. That said, if processing instructions are critical to your application, I would accept a patch to wrap the response in an envelope that includes processing instructions as headers. However, I don't see why this would necessitate returning something other than a Hash. For example:
doc = MultiXML.parse(xml, :symbolize_keys => true)
doc[:headers][:processing_instructions] # e.g. {:xml_stylesheet => "href=\"/style.xsl\" type=\"text/xsl\""}
doc[:body] # the parsed document
While I'm aware that processing instructions can appear anywhere in the document, all of the uses I've seen appear in the prolog, before the root node.
To your second point, I would argue that the distinction between attributes and child nodes is syntactic or stylistic, not semantic. In my mind, both of your examples parse as "a user whose name is tom." Do you have a different interpretation?
The easiest way to persuade me is by pointing to a real-world example of a document or API that uses the same attribute and child node name to describe two distinct properties. For example:
<user name="tom"><name>tom</name></user>
where the value of the attribute "name" means something different from contents of the node "name". In such a case, it would make sense for the values to be different:
<user name="tom"><name>bob</name></user>
but I would argue that this document, while perfectly valid XML, is nonsensical and would never appear in the wild (but I'd be happy to be proven wrong—there are a lot of crazy documents in the world).
I hope you understand, I'm trying to be pragmatic, solving for the most common use cases first. I'm happy to solve for edge cases when they arise, but, in my experience, edge cases often turn out to be theoretical barriers to progress, as opposed to actual constraints.
from multi_xml.
<xml>
<param name="name_a">foo</param>
<param name="name_b">bar</param>
</xml>
The name attribute values aren't present in the hash at all.
from multi_xml.
Attributes and their values are not optional in xml parsing.
from multi_xml.
I just bumped into the problem @paulwalker mentionned here. When you have a node with an attribute but without inner node (ie content only) the attribute is not present in the hash.
This is causing some problem right now. Do you have a workaround?
from multi_xml.
I agree this is a bug and don't have a simple workaround. Would you be able to write a failing spec? That seems like a good start.
from multi_xml.
I've got a fix I created for a project, but it break the current way of handling attributes. Instead of adding the attributes to the node itself, I changed it so it is added to the parent node in the following format: node_name@attr_name
It also solve the problem that first created this ticket: know if a node is an attribute or not.
It is a one line fix within the lib2xmlparser. It could be implemented as a different parsing mode, so multi_xml would stay compatible. I'll commit it and send it over.
EDIT: you can have a look here juggy@1d4fd5d#L0L41
from multi_xml.
Running into same problem here. Real-world XML pulled from a Filemaker database. The generated hash drops the DISPLAY attribute entirely.
<VALUELIST NAME="Employee Unique ID">
<VALUE DISPLAY="281 Abel">281</VALUE>
<VALUE DISPLAY="254 Adam">254</VALUE>
<VALUE DISPLAY="182 Adriane">182</VALUE>
<VALUE DISPLAY="213 Alma">213</VALUE>
<VALUE DISPLAY="183 Amanda">183</VALUE>
</VALUELIST>
from multi_xml.
Could you please write a failing spec for this case?
from multi_xml.
hey guys,
everyone could write a failing spec for this, but i don't think that's the problem here. as @trans pointed out, the current
hash structure can probably represent most xml data, but it fails in more complex situations. you identified two cases
in which multi_xml fails to return a proper result and i know that similar projects (crack for example) have the same problems.
i came across a project that seems to solve those problems by introducing a little more structure to the hash years ago.
it's called cobra vs mongoose (brilliant name!) and here's an example:
xml = '<alice id="1"><bob id="2">charlie</bob><bob id="3">david</bob></alice>'
CobraVsMongoose.xml_to_hash(xml)
# => { "alice" => { "@id" => "1", "bob" => [ { "@id" => "2", "$" => "charlie" }, { "@id" => "3", "$" => "david" } ] } }
a structure like this is certainly less convenient, but i just wanted to point out that these problems can be solved.
cheers,
daniel
from multi_xml.
Related Issues (20)
- Failing tests for multi_xml 0.4.1 with ruby 1.9.3 HOT 2
- Add bigdecimal to gemspec HOT 4
- Losing Attributes on the node parsing when converting to a hash HOT 8
- uninitialized class variable @@parser in MultiXml (NameError) HOT 2
- NameError: uninitialized class variable @@parser in MultiXml HOT 2
- Different parsing when using ox HOT 6
- Inconsistency with rexml parser HOT 2
- Make parsing errors inspectable HOT 1
- Remove ability to parse Symbols and YAML
- Looking for versioning guidance HOT 1
- Default parser is broken in new Rubinius HOT 8
- Inconsistent namespaces handling
- Inconsistent handling of "empty" elements. HOT 1
- uninitialized constant Nokogiri::XML::SyntaxError
- Switching parser backend is not threadsafe
- undefined method `sax_parse' for Ox:Module HOT 1
- Update the changelog with the 0.6.0 content
- some tests failing with newer ox gem (>= 2.4.13)
- Set parser at parse time HOT 1
- libxml and oga not working in 2.6.x
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from multi_xml.