XML to CUE Encoding Proposal #3776
Replies: 5 comments 8 replies
-
Thank you for writing up this proposal! I'll write some minor high level thoughts we've had about it in this comment here, and split up some other thoughts into separate comments so they can spawn their own sub-threads without being mixed together.
|
Beta Was this translation helpful? Give feedback.
-
Following my point 3 above, what would happen for repeated XML elements with the same tag name but different namespaces? For example:
Interestingly enough, badgerfish seems to not be clear on this edge case. |
Beta Was this translation helpful? Give feedback.
-
It's worth pointing out that your example
and a variant with the same namespace URIs but with different prefixes, like
are effectively representing the same data, but would result in different decoded CUE. This affects badgerfish as well, so it's presumably not a significant problem. Update: light edit by @myitcv to name the blobs of XML |
Beta Was this translation helpful? Give feedback.
-
I would like to take a second to acknowledge that this discussion is about emulating xml semantics in cue, and that that is insane and impressive at the same time. |
Beta Was this translation helpful? Give feedback.
-
As of 096a114, the Please note that this new encoding is experimental for now, given that it's a bespoke design that might need to be tweaked as we gain experience with it. There is a Go package available here: https://pkg.go.dev/cuelang.org/go/encoding/xml/koala And the CLI supports it as follows:
|
Beta Was this translation helpful? Give feedback.
-
XML to CUE Encoding Proposal
Problem
Many users would benefit from using CUE with their XML files, however CUE does not currently have an encoding that supports XML.
Purpose of this document
This document puts forward a proposal for an XML to CUE mapping called
koala
that can be used to add an XML encoding to CUE.Given XML has constructs like attributes and namespaces that don't have identical analogues in CUE, there are many approaches for mapping from XML to CUE, with other future XML encodings possible. Examples of future options could include encodings that use a schema-guided approach (similar to the textproto encoding), and raw low-level AST style encodings that model each XML construct as a node that associates to abstractions like attributes, tags, and other content.
Objectives
The mapping in this proposal aims to:
Non-Objective
Proposed Mapping
The proposed mapping follows a convention that is inspired by the Badgerfish convention, with deviations for compatibility with CUE and increased readability.
This new mapping will be called
koala
and follows the rules below:$
, and$$
, where that property belongs to the struct that is mapped from the XML content's parent element.\r
) found in a string are discarded from that string.Sample CUE constraints for XML using
koala
Using the rules above, one would be able to write CUE constraints for XML as shown below:
Given an XML file with a
note
element and abook
element, we could write a CUE schema to define types as shown below:XML
CUE constraints
Mapping examples
The examples below illustrate each of the mapping rules defined above:
1. Elements
The XML
note
element below maps to thenote
struct in CUE.XML
CUE
2. Nested Elements
Nesting an XML
to
element to thenote
element from the first example results in a nested CUEto
struct inside thenote
struct.XML
CUE
3. Attributes
The
alpha
attribute of thenote
element in XML below maps to the$
prefixed$alpha
property of thenote
struct in CUE.XML
CUE
4. Content
The content of the
note
XML element below maps to the value of the$$
property of thenote
struct in CUE.XML
CUE
5. XML Lists
The multiple XML
note
elements at the same level map to a list ofnote
structs in CUE.XML
CUE
6 and 7. Namespace Definitions and References
The
h
andr
XML namespace definitions declared in thetable
XML element are declared as properties of theh:table
struct in CUE.Note how the namespace prefixed XML element names like
h:table
,h:tr
,h:td
andr:blah
carry across to the key names of their corresponding CUE structs.XML
CUE
8. Element and attribute values
XML element and attribute values map to strings, as shown in the example below.
XML
CUE
Alternative Conventions Considered
Although no known conventions exist to map from XML to CUE, there are a number of known mappings that take XML to JSON, which we can take inspiration from.
Parker and Spark Conventions
The Parker and Spark conventions use a very simplistic model where XML elements are mapped to object properties, and attributes are ignored.
We wish to maintain attribute information so we cannot use these mapping conventions as a whole.
Badgerfish
The Badgerfish convention maps elements, attributes, and content from XML to JSON. We follow the many of the rules in the Badgerfish convention, described here. Notable differences are listed below to allow for mapping to CUE and for increased readability:
XML attributes map to CUE properties starting with a
$
prefix instead of an@
prefix, given@
is already reserved in CUE for CUE attributes. Although we could still use the@
prefix using quotes in CUE, we do not want to overload the usage of@
for two concepts (ie: for XML attribute prefixes and for CUE attributes). Using the$
prefix will also provide a less verbose notation given quotes do not need to be used with this prefix.Given a single
$
is not a valid identifier in CUE, we use$$
as the property to model element text content instead of$
. Although we could use a quoted"$"
as the key, we avoid this to prevent ambiguity with usage of$
in other contexts, (such as "root element" in JSONPath), and to minimize usage of quotes.For namespaces, we do not recursively define namespaces in nested objects as this would un-necessarily increase verbosity in the mapped CUE. Instead we align more closely to how namespaces are defined in the XML, and only define namespaces in the CUE at the same level as they are declared in the XML. To illustrate how
koala
simplifies the mapping, we provide the example below (Badgerfish mapping taken from here):XML
Badgerfish
koala
GData
The GData convention is similar to Badgerfish, however makes no distinction between identifiers used for elements and those used for attributes.
Unlike the Badgerfish convention, if one were to use this convention to map from XML to CUE, it would mean that it becomes ambiguous whether you are referring to an attribute or to an element when writing a CUE constraint. Further, it is not clear from the rules specified here what happens when there is a collision between an element name and an attribute name.
Abdera
This convention is similar to the GData convention, however, it uses separate
children
andattribute
abstractions when both nested XML elements and attributes are mentioned. Having to mentionchildren
and/orattributes
in CUE constraints, as well as integer indexes forchildren
arrays increases verbosity and complexity, which goes against the readability objective of this paper. To illustrate this with an example for Abdera:XML
would map to:
CUE
JsonML
Short for JSON Markup Language, this convention makes heavy use of arrays to ensure an order-preserving mapping, where each element maps to an array entry, and each attribute also maps to an array entry. An example mapping is shown here.
Having to work out (count) integer indexes when writing a CUE constraint rather than just simply using the element and attribute identifiers found in the XML makes this mapping too unwieldy to use for the purposes of our mapping.
Testing Plan
The XML to CUE mapping scenarios required are covered by the examples described here. We will consider the solution complete once it can both decode and encode the examples shown there, along with any other test cases requested by the CUE maintainer team.
Deployment Plan
The new
koala
encoding will not be the default XML encoding, but rather an opt-in encoding. Users will be able to use this from the command line using a command similar to:cue vet schema.cue xml+koala: data.xml
Given this is not the default encoding, the command below would not work:
cue vet schema.cue data.xml
This will initially be an experimental encoding, which will be specified in the documentation, however given that it requires opt-in when the xml encoding to be used is specified, it does not need to be toggled using the
CUE_EXPERIMENT
variable as other experimental features are.We also note that embedded XML within CUE will not be supported on day 1.
Beta Was this translation helpful? Give feedback.
All reactions