Skip to content

Commit 8a3e0c9

Browse files
committed
Add serialization via SLAXML:xml()
* Also fixes #10 * Also fixes #11
1 parent 8bfc922 commit 8a3e0c9

File tree

9 files changed

+544
-239
lines changed

9 files changed

+544
-239
lines changed

LICENSE.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
Copyright (c) 2013 Gavin Kistner
1+
Copyright (c) 2013-2018 Gavin Kistner
22

33
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
44

README.md

Lines changed: 111 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ parser = SLAXML:parser{
3535
startElement = function(name,nsURI,nsPrefix) end, -- When "<foo" or <x:foo is seen
3636
attribute = function(name,value,nsURI,nsPrefix) end, -- attribute found on current element
3737
closeElement = function(name,nsURI) end, -- When "</foo>" or </x:foo> or "/>" is seen
38-
text = function(text) end, -- text and CDATA nodes
38+
text = function(text,cdata) end, -- text and CDATA nodes (cdata is true for cdata nodes)
3939
comment = function(content) end, -- comments
4040
pi = function(target,content) end, -- processing instructions e.g. "<?yes mon?>"
4141
}
@@ -76,9 +76,10 @@ The returned table is a 'document' composed of tables for elements, attributes,
7676
* <strong>`someEl.type`</strong> : the string `"element"`
7777
* <strong>`someEl.name`</strong> : the string name of the element (without any namespace prefix)
7878
* <strong>`someEl.nsURI`</strong> : the namespace URI for this element; `nil` if no namespace is applied
79+
* <strong>`someAttr.nsPrefix`</strong> : the namespace prefix string; `nil` if no prefix is applied
7980
* <strong>`someEl.attr`</strong> : a table of attributes, indexed by name and index
8081
* `local value = someEl.attr['attribute-name']` : any namespace prefix of the attribute is not part of the name
81-
* `local someAttr = someEl.attr[1]` : an single attribute table (see below); useful for iterating all attributes of an element, or for disambiguating attributes with the same name in different namespaces
82+
* `local someAttr = someEl.attr[1]` : a single attribute table (see below); useful for iterating all attributes of an element, or for disambiguating attributes with the same name in different namespaces
8283
* <strong>`someEl.kids`</strong> : an array table of child elements, text nodes, comment nodes, and processing instructions
8384
* <strong>`someEl.el`</strong> : an array table of child elements only
8485
* <strong>`someEl.parent`</strong> : reference to the parent element or document table
@@ -87,10 +88,12 @@ The returned table is a 'document' composed of tables for elements, attributes,
8788
* <strong>`someAttr.name`</strong> : the name of the attribute (without any namespace prefix)
8889
* <strong>`someAttr.value`</strong> : the string value of the attribute (with XML and numeric entities unescaped)
8990
* <strong>`someAttr.nsURI`</strong> : the namespace URI for the attribute; `nil` if no namespace is applied
91+
* <strong>`someAttr.nsPrefix`</strong> : the namespace prefix string; `nil` if no prefix is applied
9092
* <strong>`someAttr.parent`</strong> : reference to the owning element table
9193
* **Text** - for both CDATA and normal text nodes
9294
* <strong>`someText.type`</strong> : the string `"text"`
9395
* <strong>`someText.name`</strong> : the string `"#text"`
96+
* <strong>`someText.cdata`</strong> : `true` if the text was from a CDATA block
9497
* <strong>`someText.value`</strong> : the string content of the text node (with XML and numeric entities unescaped for non-CDATA elements)
9598
* <strong>`someText.parent`</strong> : reference to the parent element table
9699
* **Comment**
@@ -126,13 +129,109 @@ print(elementText(para)) --> "Hello you crazy World!"
126129

127130
### A Simpler DOM
128131

129-
If you want the DOM tables to be simpler-to-serialize you can supply the `simple` option via:
132+
If you want the DOM tables to be easier to inspect you can supply the `simple` option via:
130133

131134
```lua
132135
local dom = SLAXML:dom(myXML,{ simple=true })
133136
```
134137

135-
In this case no table will have a `parent` attribute, elements will not have the `el` collection, and the `attr` collection will be a simple array (without values accessible directly via attribute name). In short, the output will be a strict hierarchy with no internal references to other tables, and all data represented in exactly one spot.
138+
In this case the document will have no `root` property, no table will have a `parent` property, elements will not have the `el` collection, and the `attr` collection will be a simple array (without values accessible directly via attribute name). In short, the output will be a strict hierarchy with no internal references to other tables, and all data represented in exactly one spot.
139+
140+
141+
### Serializing the DOM
142+
143+
You can serialize any DOM table to an XML string by passing it to the `SLAXML:xml()` method:
144+
145+
```lua
146+
local SLAXML = require 'slaxdom'
147+
local doc = SLAXML:dom(myxml)
148+
-- ...modify the document...
149+
local xml = SLAXML:xml(doc)
150+
```
151+
152+
The `xml()` method takes an optional table of options as its second argument:
153+
154+
```lua
155+
local xml = SLAXML:xml(doc,{
156+
indent = 2, -- each pi/comment/element/text node on its own line, indented by this many spaces
157+
indent = '\t', -- ...or, supply a custom string to use for indentation
158+
sort = true, -- sort attributes by name, with no-namespace attributes coming first
159+
omit = {...} -- an array of namespace URIs; removes elements and attributes in these namespaces
160+
})
161+
```
162+
163+
When using the `indent` option, you likely want to ensure that you parsed your DOM using the `stripWhitespace` option. This will prevent you from having whitespace text nodes between elements that are then placed on their own indented line.
164+
165+
Some examples showing the serialization options:
166+
167+
```lua
168+
local xml = [[
169+
<!-- a simple document showing sorting and namespace culling -->
170+
<r c="1" z="3" b="2" xmlns="uri1" xmlns:x="uri2" xmlns:a="uri3">
171+
<e a:foo="f" x:alpha="a" a:bar="b" alpha="y" beta="beta" />
172+
<a:wrap><f/></a:wrap>
173+
</r>
174+
]]
175+
176+
local dom = SLAXML:dom(xml, {stripWhitespace=true})
177+
178+
print(SLAXML:xml(dom))
179+
--> <!-- a simple document showing sorting and namespace culling --><r c="1" z="3" b="2" xmlns="uri1" xmlns:x="uri2" xmlns:a="uri3"><e a:foo="f" x:alpha="a" a:bar="b" alpha="y" beta="beta"/><a:wrap><f/></a:wrap></r>
180+
181+
print(SLAXML:xml(dom, {indent=2}))
182+
--> <!-- a simple document showing sorting and namespace culling -->
183+
--> <r c="1" z="3" b="2" xmlns="uri1" xmlns:x="uri2" xmlns:a="uri3">
184+
--> <e a:foo="f" x:alpha="a" a:bar="b" alpha="y" beta="beta"/>
185+
--> <a:wrap>
186+
--> <f/>
187+
--> </a:wrap>
188+
--> </r>
189+
190+
print(SLAXML:xml(dom.root.kids[2]))
191+
--> <a:wrap><f/></a:wrap>
192+
-- NOTE: you can serialize any DOM table node, not just documents
193+
194+
print(SLAXML:xml(dom.root.kids[1], {indent=2, sort=true}))
195+
--> <e alpha="y" beta="beta" a:bar="b" a:foo="f" x:alpha="a"/>
196+
-- NOTE: attributes with no namespace come first
197+
198+
print(SLAXML:xml(dom, {indent=2, omit={'uri3'}}))
199+
--> <!-- a simple document showing sorting and namespace culling -->
200+
--> <r c="1" z="3" b="2" xmlns="uri1" xmlns:x="uri2">
201+
--> <e x:alpha="a" alpha="y" beta="beta"/>
202+
--> </r>
203+
-- NOTE: Omitting a namespace omits:
204+
-- * namespace declaration(s) for that space
205+
-- * attributes prefixed for that namespace
206+
-- * elements in that namespace, INCLUDING DESCENDANTS
207+
208+
print(SLAXML:xml(dom, {indent=2, omit={'uri3', 'uri2'}}))
209+
--> <!-- a simple document showing sorting and namespace culling -->
210+
--> <r c="1" z="3" b="2" xmlns="uri1">
211+
--> <e alpha="y" beta="beta"/>
212+
--> </r>
213+
214+
print(SLAXML:xml(dom, {indent=2, omit={'uri1'}}))
215+
--> <!-- a simple document showing sorting and namespace culling -->
216+
-- NOTE: Omitting namespace for the root element removes everything
217+
```
218+
219+
Serialization of elements and attributes ignores the `nsURI` property in favor of the `nsPrefix` attribute. As such, you can construct DOM's that serialize to invalid XML:
220+
221+
```lua
222+
local el = {
223+
type="element",
224+
nsPrefix="oops", name="root",
225+
attr={
226+
{type="attribute", name="xmlns:nope", value="myuri"},
227+
{type="attribute", nsPrefix="x", name="wow", value="myuri"}
228+
}
229+
}
230+
print( SLAXML:xml(el) )
231+
--> <oops:root xmlns:nope="myuri" x:wow="myuri"/>
232+
```
233+
234+
So, if you want to use a `foo` prefix on an element or attribute, be sure to add an appropriate `xmlns:foo` attribute defining that namespace on an ancestor element.
136235

137236

138237
## Known Limitations / TODO
@@ -157,6 +256,14 @@ In this case no table will have a `parent` attribute, elements will not have the
157256

158257
## History
159258

259+
### v0.8 2018-Oct-23
260+
+ Adds `SLAXML:xml()` to serialize the DOM back to XML.
261+
+ Adds `nsPrefix` properties to the DOM tables for elements and attributes (needed for round-trip serialization)
262+
+ Fixes test suite to work on Lua 5.2, 5.3.
263+
+ Fixes Issue #10, allowing DOM parser to handle comments/PIs after the root element.
264+
+ Fixes Issue #11, causing DOM parser to preserve whitespace text nodes on the document.
265+
+ **Backwards-incompatible change**: Removes `doc.root` key from DOM when `simple=true` is specified.
266+
160267
### v0.7 2014-Sep-26
161268
+ Decodes entities above 127 as UTF8 (decimal and hexadecimal).
162269
- The encoding specified by the document is (still) ignored.

slaxdom.lua

Lines changed: 94 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -4,24 +4,23 @@ function SLAXML:dom(xml,opts)
44
if not opts then opts={} end
55
local rich = not opts.simple
66
local push, pop = table.insert, table.remove
7-
local stack = {}
8-
local doc = { type="document", name="#doc", kids={} }
9-
local current = doc
7+
local doc = {type="document", name="#doc", kids={}}
8+
local current,stack = doc, {doc}
109
local builder = SLAXML:parser{
11-
startElement = function(name,nsURI)
12-
local el = { type="element", name=name, kids={}, el=rich and {} or nil, attr={}, nsURI=nsURI, parent=rich and current or nil }
10+
startElement = function(name,nsURI,nsPrefix)
11+
local el = { type="element", name=name, kids={}, el=rich and {} or nil, attr={}, nsURI=nsURI, nsPrefix=nsPrefix, parent=rich and current or nil }
1312
if current==doc then
1413
if doc.root then error(("Encountered element '%s' when the document already has a root '%s' element"):format(name,doc.root.name)) end
15-
doc.root = el
14+
doc.root = rich and el or nil
1615
end
1716
push(current.kids,el)
1817
if current.el then push(current.el,el) end
1918
current = el
2019
push(stack,el)
2120
end,
22-
attribute = function(name,value,nsURI)
21+
attribute = function(name,value,nsURI,nsPrefix)
2322
if not current or current.type~="element" then error(("Encountered an attribute %s=%s but I wasn't inside an element"):format(name,value)) end
24-
local attr = {type='attribute',name=name,nsURI=nsURI,value=value,parent=rich and current or nil}
23+
local attr = {type='attribute',name=name,nsURI=nsURI,nsPrefix=nsPrefix,value=value,parent=rich and current or nil}
2524
if rich then current.attr[name] = value end
2625
push(current.attr,attr)
2726
end,
@@ -30,11 +29,10 @@ function SLAXML:dom(xml,opts)
3029
pop(stack)
3130
current = stack[#stack]
3231
end,
33-
text = function(value)
34-
if current.type~='document' then
35-
if current.type~="element" then error(("Received a text notification '%s' but was inside a %s"):format(value,current.type)) end
36-
push(current.kids,{type='text',name='#text',value=value,parent=rich and current or nil})
37-
end
32+
text = function(value,cdata)
33+
-- documents may only have text node children that are whitespace: https://www.w3.org/TR/xml/#NT-Misc
34+
if current.type=='document' and not value:find('^%s+$') then error(("Document has non-whitespace text at root: '%s'"):format(value:gsub('[\r\n\t]',{['\r']='\\r', ['\n']='\\n', ['\t']='\\t'}))) end
35+
push(current.kids,{type='text',name='#text',cdata=cdata and true or nil,value=value,parent=rich and current or nil})
3836
end,
3937
comment = function(value)
4038
push(current.kids,{type='comment',name='#comment',value=value,parent=rich and current or nil})
@@ -46,4 +44,87 @@ function SLAXML:dom(xml,opts)
4644
builder:parse(xml,opts)
4745
return doc
4846
end
47+
48+
local escmap = {["<"]="&lt;", [">"]="&gt;", ["&"]="&amp;", ['"']="&quot;", ["'"]="&apos;"}
49+
local function esc(s) return s:gsub('[<>&"]', escmap) end
50+
51+
-- opts.indent: number of spaces, or string
52+
function SLAXML:xml(n,opts)
53+
opts = opts or {}
54+
local out = {}
55+
local tab = opts.indent and (type(opts.indent)=="number" and string.rep(" ",opts.indent) or opts.indent) or ""
56+
local ser = {}
57+
local omit = {}
58+
if opts.omit then for _,s in ipairs(opts.omit) do omit[s]=true end end
59+
60+
function ser.document(n)
61+
for _,kid in ipairs(n.kids) do
62+
if ser[kid.type] then ser[kid.type](kid,0) end
63+
end
64+
end
65+
66+
function ser.pi(n,depth)
67+
depth = depth or 0
68+
table.insert(out, tab:rep(depth)..'<?'..n.name..' '..n.value..'?>')
69+
end
70+
71+
function ser.element(n,depth)
72+
if n.nsURI and omit[n.nsURI] then return end
73+
depth = depth or 0
74+
local indent = tab:rep(depth)
75+
local name = n.nsPrefix and n.nsPrefix..':'..n.name or n.name
76+
local result = indent..'<'..name
77+
if n.attr and n.attr[1] then
78+
local sorted = n.attr
79+
if opts.sort then
80+
sorted = {}
81+
for i,a in ipairs(n.attr) do sorted[i]=a end
82+
table.sort(sorted,function(a,b)
83+
if a.nsPrefix and b.nsPrefix then
84+
return a.nsPrefix==b.nsPrefix and a.name<b.name or a.nsPrefix<b.nsPrefix
85+
elseif not (a.nsPrefix or b.nsPrefix) then
86+
return a.name<b.name
87+
elseif b.nsPrefix then
88+
return true
89+
else
90+
return false
91+
end
92+
end)
93+
end
94+
95+
local attrs = {}
96+
for _,a in ipairs(sorted) do
97+
if (not a.nsURI or not omit[a.nsURI]) and not (omit[a.value] and a.name:find('^xmlns:')) then
98+
attrs[#attrs+1] = ' '..(a.nsPrefix and (a.nsPrefix..':') or '')..a.name..'="'..esc(a.value)..'"'
99+
end
100+
end
101+
result = result..table.concat(attrs,'')
102+
end
103+
result = result .. (n.kids and n.kids[1] and '>' or '/>')
104+
table.insert(out, result)
105+
if n.kids and n.kids[1] then
106+
for _,kid in ipairs(n.kids) do
107+
if ser[kid.type] then ser[kid.type](kid,depth+1) end
108+
end
109+
table.insert(out, indent..'</'..name..'>')
110+
end
111+
end
112+
113+
function ser.text(n,depth)
114+
if n.cdata then
115+
table.insert(out, tab:rep(depth)..'<![[CDATA['..n.value..']]>')
116+
else
117+
table.insert(out, tab:rep(depth)..esc(n.value))
118+
end
119+
end
120+
121+
function ser.comment(n,depth)
122+
table.insert(out, tab:rep(depth)..'<!--'..n.value..'-->')
123+
end
124+
125+
ser[n.type](n,0)
126+
127+
return table.concat(out, opts.indent and '\n' or '')
128+
end
129+
49130
return SLAXML

slaxml-0.7-0.rockspec

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
package = "SLAXML"
2-
version = "0.7-0"
2+
version = "0.8-0"
33
source = {
44
url = "https://github.com/Phrogz/SLAXML.git"
55
}

slaxml.lua

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
--[=====================================================================[
2-
v0.7 Copyright © 2013-2014 Gavin Kistner <[email protected]>; MIT Licensed
2+
v0.8 Copyright © 2013-2018 Gavin Kistner <[email protected]>; MIT Licensed
33
See http://github.com/Phrogz/SLAXML for details.
44
--]=====================================================================]
55
local SLAXML = {
6-
VERSION = "0.7",
6+
VERSION = "0.8",
77
_call = {
88
pi = function(target,content)
99
print(string.format("<?%s %s?>",target,content))
@@ -25,11 +25,13 @@ local SLAXML = {
2525
if nsURI then io.write(" (ns='",nsURI,"')") end
2626
io.write("\n")
2727
end,
28-
text = function(text)
29-
print(string.format(" text: %q",text))
28+
text = function(text,cdata)
29+
print(string.format(" %s: %q",cdata and 'cdata' or 'text',text))
3030
end,
3131
closeElement = function(name,nsURI,nsPrefix)
32-
print(string.format("</%s>",name))
32+
io.write("</")
33+
if nsPrefix then io.write(nsPrefix,":") end
34+
print(name..">")
3335
end,
3436
}
3537
}
@@ -71,7 +73,7 @@ function SLAXML:parse(xml,options)
7173
end
7274
end
7375
local entityMap = { ["lt"]="<", ["gt"]=">", ["amp"]="&", ["quot"]='"', ["apos"]="'" }
74-
local entitySwap = function(orig,n,s) return entityMap[s] or n=="#" and utf8(tonumber('0'..s)) or orig end
76+
local entitySwap = function(orig,n,s) return entityMap[s] or n=="#" and utf8(tonumber('0'..s)) or orig end
7577
local function unescape(str) return gsub( str, '(&(#?)([%d%a]+);)', entitySwap ) end
7678

7779
local function finishText()
@@ -82,7 +84,7 @@ function SLAXML:parse(xml,options)
8284
text = gsub(text,'%s+$','')
8385
if #text==0 then text=nil end
8486
end
85-
if text then self._call.text(unescape(text)) end
87+
if text then self._call.text(unescape(text),false) end
8688
end
8789
end
8890

@@ -180,7 +182,7 @@ function SLAXML:parse(xml,options)
180182
first, last, match1 = find( xml, '^<!%[CDATA%[(.-)%]%]>', pos )
181183
if first then
182184
finishText()
183-
if self._call.text then self._call.text(match1) end
185+
if self._call.text then self._call.text(match1,true) end
184186
pos = last+1
185187
textStart = pos
186188
return true
@@ -233,7 +235,7 @@ function SLAXML:parse(xml,options)
233235

234236
while pos<#xml do
235237
if state=="text" then
236-
if not (findPI() or findComment() or findCDATA() or findElementClose()) then
238+
if not (findPI() or findComment() or findCDATA() or findElementClose()) then
237239
if startElement() then
238240
state = "attributes"
239241
else

test/files/commentwrapper.xml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
<!-- before -->
2+
3+
<r/>
4+
5+
<!-- after -->

test/files/state.scxml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
<?xml version="1.0" encoding="UTF-8"?>
2+
<scxml xmlns="http://www.w3.org/2005/07/scxml" xmlns:nv="http://nvidia.com/drive/ar/scxml" xmlns:dumb="nope" version="1">
3+
<state id="AwaitingChoice" nv:loc="0 0 400 300">
4+
<state id="UpToDate" nv:rgba="0 0.5 1 0.2" nv:loc="10 10 100 40">
5+
<transition event="ota.available" nv:anchor="e1" target="UpdateAvailable" dumb:status="very" type="internal"/>
6+
</state>
7+
</state>
8+
<dumb:wrapper>
9+
<state />
10+
</dumb:wrapper>
11+
</scxml>

0 commit comments

Comments
 (0)