Comments (22)
Okay, I think what I'd like to go with is creating an --xml-whitespace-sensitivity
option that has ignore
, strict
, and smart
(or something like that. The first two are obviously far easier to implement (the last one requires a lot of knowledge about the schema and elements, I'm very interested in implementing it, but not as a first pass).
from plugin-xml.
@thorn0 I don't think there is anything wrong with the HTML parser. To quickly show what I meant see here.
It is XML with a given default namespace and a Title
tag inside that namespace. The HTML parser throws an error because it assumes this is the Title
of HTML (that's my guess at least). From my point of view that is okay because my content is not HTML but XML.
I liked your comment because it showed the effect that a whitespace option could have for our XML.
from plugin-xml.
Okay so a couple of things are going on here. Technically, the plugin is functioning as designed. Though I realize it's not the best user experience at the moment.
Unfortunately in XML whitespace is very significant unless you're inside of <>
. So what's happening is when prettier is formatting these groups, it's maintaining the exact same amount of whitespace you had inside the start and end tags to maintain parity. Otherwise for a lot of XML use cases prettier would actually be changing the content unwittingly.
With lines 21-23, 25, 29, and 30, you're actually seeing where the indentation would be if whitespace were insignificant.
I'm really not entirely sure of the way around this. I've been thinking maybe an xmlIgnoreWhitespace
option for those that don't care about the exact whitespace count, but it would have to be default off.
from plugin-xml.
Thanks for the explanation! To dumb it down for me (I am not the XML expert as you probably already noticed...):
Row 8 in the result is indented by two spaces because row 7 in the original is also indented by two spaces? Meaning tabWidth
is not applied? In lines such as 21 - 23 it is applied, just not with the parent tag as basis but with their real "level" within the tree?
In that case I agree with your idea of having an option for that, so that tabWidth
is applied to rows with tags as well. Because with this kind of files my ideal result would indent row 8 by 4 spaces, row 12 by 8 spaces etc.
PS: preserving the indentation inside multi-row strings is of course correct, I am not talking about those!
from plugin-xml.
So basically, when you have something like:
<foo>
<bar />
</foo>
in XML that means that the content inside of the foo
tag is actually: ["\n ", bar, "\n"]
. The whitespace in this case is significant. As much as it looks like it's not because we've kind of come to agree that the spacing should look like a tree. But imagine a different example like:
<foo>
bar <baz /> qux
</foo>
Now the content inside of the foo
tag is actually: ["\n bar ", baz, " qux\n"]
.
from plugin-xml.
So yeah, basically I'm thinking of trying to provide a rule that says if you don't have mixed content inside the tags, then we could probably just consider the whitespace as insignificant and indent as we like, but right now it would definitely change things otherwise and potentially break XML integrations.
from plugin-xml.
Specification wise XML is indeed whitespace sensitive.
However what % of real XML files are really whitespace sensitive?
Perhaps the default of ignoring whitespace in contents would be more functionally useful even if not 100% correct while those users who have meaningful whitespace would need to explicitly turn the WS ignoring off?
If we do not ignore whitespace it is very difficult to get a real "pretty XML"...
But I do not know what is the % of users who have meaningful whitespace?
This issue is what stumped me when I was playing around with a prettier plugin for
XML a few months ago... π’
from plugin-xml.
I think it pretty much has to be an option. Otherwise this is really only prettier-ifying open/close tags and prolog content.
from plugin-xml.
Now to figure out what to name it. xmlIgnoreWhitespace
? xmlWhitespaceInsignificant
? xmlWhitespaceSignificantOnlyOnMixedContent
?
from plugin-xml.
For HTML, the analogous option of Prettier is called --html-whitespace-sensitivity
.
Also schema languages (DTD, XML Schema, RELAX NG, Schematron) are used to define, among other things, whitespace rules for elements. If the document has an associated schema, you should be able to get the needed info about whitespace significance from there.
Finally, you can probably somehow use the standard xml:space
attribute, but I don't know much about it.
from plugin-xml.
My 2 cents:
I would go with the option name --html-whitespace-sensitivity
linked above.
but only keep two of the three options:
- "strict" - Whitespaces are considered sensitive.
- "ignore" - Whitespaces are considered insensitive.
And use "ignore" by default and see if any users complain. π
I'd probably also emphasize this option and its default behavior in the documentation.
and only in the longer term evaluate more granular control e.g via xml:space
attribute.
from plugin-xml.
Sounds good for starters.
The "css" value would make sense too as XML is one of the two syntaxes of HTML5.
And it might be possible to come up with a simple, but good enough heuristic for another value, "guess". E.g. "if an element doesn't have any non-whitespace text nodes among its children, then its whitespace-only child text nodes can be ignored". Something like that.
from plugin-xml.
I would go with the option name
--html-whitespace-sensitivity
linked above.
but only keep two of the three options:
- "strict" - Whitespaces are considered sensitive.
- "ignore" - Whitespaces are considered insensitive.
And use "ignore" by default and see if any users complain.
That sounds very reasonable. In our specific case, all whitespaces are insignificant, except for those inside multi-row strings. So if a value of ignore
would also ignore those, then a third value like stringsOnly
would be good, which would only keep whitespace inside of multi-row strings.
from plugin-xml.
@winterkind BTW, did you try formatting your files with Prettier's HTML formatter and --html-whitespace-sensitivity ignore
? Should work fine in your case: playground.
from plugin-xml.
@thorn0 You are right - that looks very good, also with our larger files. It just needs some convincing at this point. The "babel" parser complains about HTML comments and the explicit HTML parser has issues with our default namespace not being HTML.
But as I said: with the "babel" parser and after removing comments, the output is excellent.
from plugin-xml.
@winterkind I meant the html
parser. What issues exactly did you have with it? Could you demonstrate them using a link to the playground? (Better in a new issue in https://github.com/prettier/prettier to not hijack this issue with things not related directly to plugin-xml
.)
from plugin-xml.
I am unsure if HTML is always a valid (syntactically) XML.
from plugin-xml.
@bd82 It's usually not, but that doesn't mean Prettier's HTML parser (Angular's actually) can't be smart enough to support XML. It's something I'd like to investigate when I get some time.
from plugin-xml.
Re. #40 (comment):
Also schema languages (DTD, XML Schema, RELAX NG, Schematron) are used to define, among other things, whitespace rules for elements. If the document has an associated schema, you should be able to get the needed info about whitespace significance from there.
Yep, proper context-sensitive whitespace handling is basic table stakes for an XML processing toolchain, so a simple on/off, strict/ignore approach wouldn't be sufficient, the plug-in would need to toggle the mode based on the current element.
Finally, you can probably somehow use the standard
xml:space
attribute, but I don't know much about it.
The schema languages typically handle that for each XML dialect, so the schema will typically use @xml:space
to say something like βpreserve all whitespace in elements named <codeblock>
β.
For example, from the DITA spec:
@xml:space
This attribute is provided on<pre>
,<lines>
, and on elements specialized from those.
It ensures that parsers in editors and transforms respect the white space, including line-end
characters, that is part of the data in those elements. It is intended to be part of the default
properties of these elements, and not for authors to change or delete. When defined, it has a
fixed value of "preserve".
from plugin-xml.
@winterkind would you be able to test the whitespace-ignore
branch on this (and set the xmlWhitespaceSensitivity
to ignore
)? Want to make sure it works for your use case.
from plugin-xml.
Actually I've just gone ahead and released a v0.6.0
with the changes. I'm going to close this and if we find a need for the smart
mode in the future we can build it. For now @winterkind try setting xmlWhitespaceSensitivity
to "ignore"
and it should be better for you.
from plugin-xml.
Hi, sorry for not responding earlier; I was sick for a few days. I really appreciate your help with this.
I tried 0.6.0 with the new option and it looks very good. I'll test with some more of our files and if I find anything else, I'll create new issues.
from plugin-xml.
Related Issues (20)
- Unstable attributes sorting HOT 2
- Support for inline XML DTD HOT 1
- Formatting not working on Salesforce xml files HOT 3
- Error on valid syntax HOT 6
- Preserve whitespace in empty `<xsl:text>` nodes HOT 3
- Plugin is overwriting tag content HOT 1
- Exception when formatting with prettier standalone in browser HOT 2
- Extension `Prettier - code formatter` is configured as formatter but it cannot format 'edmx'-files HOT 1
- Prettier doesn't detect at all HOT 1
- Processing instructions disrupt formatting HOT 3
- Text content including special characters may be interpretted as XML
- Text content is split around entities HOT 5
- XML attributes are uncommonly indented HOT 1
- Whitespace formatting isn't valid and idempotent with `ignore` sensitivity
- [Bug] Issue with Formatting XML Files in VSCode HOT 1
- not indenting HOT 3
- Chokes on doctype with custom entities HOT 3
- move comment before HOT 4
- npm package for 3.4.0 is missing files HOT 2
- Plugin doesn't work with latest version of prettier HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from plugin-xml.