You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: LICENSE.txt
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -1,4 +1,4 @@
1
-
Copyright (c) 2013 Gavin Kistner
1
+
Copyright (c) 2013-2018 Gavin Kistner
2
2
3
3
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
Copy file name to clipboardExpand all lines: README.md
+111-4Lines changed: 111 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -35,7 +35,7 @@ parser = SLAXML:parser{
35
35
startElement=function(name,nsURI,nsPrefix) end, -- When "<foo" or <x:foo is seen
36
36
attribute=function(name,value,nsURI,nsPrefix) end, -- attribute found on current element
37
37
closeElement=function(name,nsURI) end, -- When "</foo>" or </x:foo> or "/>" is seen
38
-
text=function(text) end, -- text and CDATA nodes
38
+
text=function(text,cdata) end, -- text and CDATA nodes (cdata is true for cdata nodes)
39
39
comment=function(content) end, -- comments
40
40
pi=function(target,content) end, -- processing instructions e.g. "<?yes mon?>"
41
41
}
@@ -76,9 +76,10 @@ The returned table is a 'document' composed of tables for elements, attributes,
76
76
* <strong>`someEl.type`</strong> : the string `"element"`
77
77
* <strong>`someEl.name`</strong> : the string name of the element (without any namespace prefix)
78
78
* <strong>`someEl.nsURI`</strong> : the namespace URI for this element; `nil` if no namespace is applied
79
+
* <strong>`someAttr.nsPrefix`</strong> : the namespace prefix string; `nil` if no prefix is applied
79
80
* <strong>`someEl.attr`</strong> : a table of attributes, indexed by name and index
80
81
*`local value = someEl.attr['attribute-name']` : any namespace prefix of the attribute is not part of the name
81
-
*`local someAttr = someEl.attr[1]` : an single attribute table (see below); useful for iterating all attributes of an element, or for disambiguating attributes with the same name in different namespaces
82
+
*`local someAttr = someEl.attr[1]` : a single attribute table (see below); useful for iterating all attributes of an element, or for disambiguating attributes with the same name in different namespaces
82
83
* <strong>`someEl.kids`</strong> : an array table of child elements, text nodes, comment nodes, and processing instructions
83
84
* <strong>`someEl.el`</strong> : an array table of child elements only
84
85
* <strong>`someEl.parent`</strong> : reference to the parent element or document table
@@ -87,10 +88,12 @@ The returned table is a 'document' composed of tables for elements, attributes,
87
88
* <strong>`someAttr.name`</strong> : the name of the attribute (without any namespace prefix)
88
89
* <strong>`someAttr.value`</strong> : the string value of the attribute (with XML and numeric entities unescaped)
89
90
* <strong>`someAttr.nsURI`</strong> : the namespace URI for the attribute; `nil` if no namespace is applied
91
+
* <strong>`someAttr.nsPrefix`</strong> : the namespace prefix string; `nil` if no prefix is applied
90
92
* <strong>`someAttr.parent`</strong> : reference to the owning element table
91
93
***Text** - for both CDATA and normal text nodes
92
94
* <strong>`someText.type`</strong> : the string `"text"`
93
95
* <strong>`someText.name`</strong> : the string `"#text"`
96
+
* <strong>`someText.cdata`</strong> : `true` if the text was from a CDATA block
94
97
* <strong>`someText.value`</strong> : the string content of the text node (with XML and numeric entities unescaped for non-CDATA elements)
95
98
* <strong>`someText.parent`</strong> : reference to the parent element table
96
99
***Comment**
@@ -126,13 +129,109 @@ print(elementText(para)) --> "Hello you crazy World!"
126
129
127
130
### A Simpler DOM
128
131
129
-
If you want the DOM tables to be simpler-to-serialize you can supply the `simple` option via:
132
+
If you want the DOM tables to be easier to inspect you can supply the `simple` option via:
130
133
131
134
```lua
132
135
localdom=SLAXML:dom(myXML,{ simple=true })
133
136
```
134
137
135
-
In this case no table will have a `parent` attribute, elements will not have the `el` collection, and the `attr` collection will be a simple array (without values accessible directly via attribute name). In short, the output will be a strict hierarchy with no internal references to other tables, and all data represented in exactly one spot.
138
+
In this case the document will have no `root` property, no table will have a `parent` property, elements will not have the `el` collection, and the `attr` collection will be a simple array (without values accessible directly via attribute name). In short, the output will be a strict hierarchy with no internal references to other tables, and all data represented in exactly one spot.
139
+
140
+
141
+
### Serializing the DOM
142
+
143
+
You can serialize any DOM table to an XML string by passing it to the `SLAXML:xml()` method:
144
+
145
+
```lua
146
+
localSLAXML=require'slaxdom'
147
+
localdoc=SLAXML:dom(myxml)
148
+
-- ...modify the document...
149
+
localxml=SLAXML:xml(doc)
150
+
```
151
+
152
+
The `xml()` method takes an optional table of options as its second argument:
153
+
154
+
```lua
155
+
localxml=SLAXML:xml(doc,{
156
+
indent=2, -- each pi/comment/element/text node on its own line, indented by this many spaces
157
+
indent='\t', -- ...or, supply a custom string to use for indentation
158
+
sort=true, -- sort attributes by name, with no-namespace attributes coming first
159
+
omit= {...} -- an array of namespace URIs; removes elements and attributes in these namespaces
160
+
})
161
+
```
162
+
163
+
When using the `indent` option, you likely want to ensure that you parsed your DOM using the `stripWhitespace` option. This will prevent you from having whitespace text nodes between elements that are then placed on their own indented line.
164
+
165
+
Some examples showing the serialization options:
166
+
167
+
```lua
168
+
localxml=[[
169
+
<!-- a simple document showing sorting and namespace culling -->
--> <!-- a simple document showing sorting and namespace culling -->
210
+
--> <r c="1" z="3" b="2" xmlns="uri1">
211
+
--> <e alpha="y" beta="beta"/>
212
+
--> </r>
213
+
214
+
print(SLAXML:xml(dom, {indent=2, omit={'uri1'}}))
215
+
--> <!-- a simple document showing sorting and namespace culling -->
216
+
-- NOTE: Omitting namespace for the root element removes everything
217
+
```
218
+
219
+
Serialization of elements and attributes ignores the `nsURI` property in favor of the `nsPrefix` attribute. As such, you can construct DOM's that serialize to invalid XML:
So, if you want to use a `foo` prefix on an element or attribute, be sure to add an appropriate `xmlns:foo` attribute defining that namespace on an ancestor element.
136
235
137
236
138
237
## Known Limitations / TODO
@@ -157,6 +256,14 @@ In this case no table will have a `parent` attribute, elements will not have the
157
256
158
257
## History
159
258
259
+
### v0.8 2018-Oct-23
260
+
+ Adds `SLAXML:xml()` to serialize the DOM back to XML.
261
+
+ Adds `nsPrefix` properties to the DOM tables for elements and attributes (needed for round-trip serialization)
262
+
+ Fixes test suite to work on Lua 5.2, 5.3.
263
+
+ Fixes Issue #10, allowing DOM parser to handle comments/PIs after the root element.
264
+
+ Fixes Issue #11, causing DOM parser to preserve whitespace text nodes on the document.
265
+
+**Backwards-incompatible change**: Removes `doc.root` key from DOM when `simple=true` is specified.
266
+
160
267
### v0.7 2014-Sep-26
161
268
+ Decodes entities above 127 as UTF8 (decimal and hexadecimal).
162
269
- The encoding specified by the document is (still) ignored.
-- documents may only have text node children that are whitespace: https://www.w3.org/TR/xml/#NT-Misc
34
+
ifcurrent.type=='document' andnotvalue:find('^%s+$') thenerror(("Document has non-whitespace text at root: '%s'"):format(value:gsub('[\r\n\t]',{['\r']='\\r', ['\n']='\\n', ['\t']='\\t'}))) end
0 commit comments