-
Notifications
You must be signed in to change notification settings - Fork 11
Address the remaining issues in #45 and fix #46 #47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi Josh, Here is a revised version of
It relies on a function ( These two examples illustrate:
Not sure how to add this as a PR. I can wait until you decide whether to merge #47 and then make a new PR with this if you do merge. Alternatively I could just update my github fork now and it will become part of #47. Reluctant to do the latter since I've been advised correct etiquette is not to bundle up multiple changes in a single PR. I'll wait to hear from you... Thanks, Tim |
oof, this performance regression is quite significant. Thank you for catching this. |
Sorry for the delay! My gut instinct is that this and the previous PR add(ed) too much complexity for handling preserved spaces, but I also recognize that preserved spaces are a really annoying thing to get right. I'll merge this and create a new release, but I'd really like a simpler implementation that's easier to navigate and contribute to. I do have a draft of a redesign going, but its nowhere near ready. |
Thanks, Josh. I really appreciate it - especially after my previous attempt! Am I OK now to make a PR for the changes to |
Yes, please do make a PR for the write method! |
This PR is motivated by two observations on PR #45:
I believe I've resolved both of these issues and I've added a slew of extra tests to cover the second of these.
Some points to note:
xml:space
attributes correctly is slower than not handling them at all (ie v0.3.5). The difference isn't huge, and certainly isn't the cause of the regression reported in Performance Regression in v0.3.6 with LazyNode Usage #46. However, to mitigate this, and because use ofxml:space
isn't particularly common, I've splitnext
andprev
into two separate pathways. When aRaw
entity is first created, I test for the presence of"xml:space"
anywhere in the data and create a flag for this (raw.has_xml_space
). Whennext
orprev
are invoked, they check this flag and only take the path that handles the attribute correctly if this flag is true. If it is false, the path taken is identical to the function from v0.3.5.xml:space
inprev
is challenging because it is necessary to know the status of the attribute which may be inherited from anywhere "above" the current text node in the xml structure but, because we are moving backwards, the xml tree hasn't yet been processed. With a little help from ChatGPT v5, I was able to find a way to usenext
to determine the correct attribute inheritance reliably. This approach has the advantage of keepingnext
andprev
reliably consistent, too.XML.write
and, as a result, it does not properly respectxml:space="preserve"
. Instead, it continues to add indentation and line feeds for pretty printing and this means a node containingxml:space="preserve"
cannot do a roundtrip through write -> parse.I've also made a one other decision in this implementation which is (perhaps) arbitrary but is essentially trivial to reverse:
xml:space = "preserve"
is specified,RawText
nodes are created between sibling nodes. I've chosen to keep these (three commented lines would readily suppress these).To illustrate this last point, consider this example:
How many children does the
<root>
node have? Because space is preserved, the first child is a Text node of<root>
itself -LazyNode (depth=2) Text "\n "
. This is the line feed and indentation which thexml:space
attribute requires to be preserved before<child>
is reached. There are also similar text nodes between</child>
and<child2>
and between</child2>
and</root>
. Thus<root>
has 5 children and not 2:I've compared the above behaviour with EzXML.jl, which does the following:
So EzXML.jl retains the orphaned Text nodes and finds the same 5 elements as in this PR.
I would expect this particular combination of features to be rare and my approach seems OK to me in the context of XML.jl.