Skip to content

Commit a735809

Browse files
committed
WIP.
1 parent 98fe8a0 commit a735809

File tree

1 file changed

+11
-9
lines changed

1 file changed

+11
-9
lines changed

draft-mcnally-deterministic-cbor.md

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -165,7 +165,7 @@ dCBOR decoders:
165165

166166
## Numeric Reduction
167167

168-
The purpose of determinism is to ensure that semantically equivalent data items are encoded into identical byte streams. Numeric reduction ensures that semantically equal numeric values (e.g. `2` and `2.0`) are encoded into identical byte streams (e.g. `0x02`) by encoding "Integral floating point values" (floating point values with a zero fractional part) as integers when possible.
168+
The purpose of determinism is to ensure that semantically equivalent data items are encoded into identical byte streams. Numeric Reduction ensures that semantically equal numeric values (e.g. `2` and `2.0`) are encoded into identical byte streams (e.g. `0x02`) by encoding "Integral floating point values" (floating point values with a zero fractional part) as integers when possible.
169169

170170
dCBOR implementations that support floating point numbers:
171171

@@ -174,7 +174,7 @@ dCBOR implementations that support floating point numbers:
174174
This also means that the three representations of a zero number in CBOR (`0`, `0.0`, `-0.0` in diagnostic notation) are all reduced to the basic integer `0` (with preferred encoding `0x00`).
175175

176176
{:aside}
177-
> Note that numeric reduction means that some maps that are valid CBOR cannot be reduced to valid dCBOR maps, as numeric reduction can result in multiple entries with the same keys ("duplicate keys"). For example, the following is a valid CBOR map:
177+
> Note that Numeric Reduction means that some maps that are valid CBOR cannot be reduced to valid dCBOR maps, as Numeric Reduction can result in multiple entries with the same keys ("duplicate keys"). For example, the following is a valid CBOR map:
178178
>
179179
> ~~~ cbor-diag
180180
> {
@@ -184,15 +184,15 @@ dCBOR implementations that support floating point numbers:
184184
> ~~~
185185
> {: title="Valid CBOR data item with numeric map keys"}
186186
>
187-
> Applying numeric reduction to this map would yield the invalid map:
187+
> Applying Numeric Reduction to this map would yield the invalid map:
188188
>
189189
> ~~~ cbor-diag
190190
> { / invalid: multiple entries with the same key /
191191
> 10: "ten",
192192
> 10: "floating ten"
193193
> }
194194
> ~~~
195-
> {: title="Numeric reduction turns valid CBOR invalid"}
195+
> {: title="Numeric Reduction turns valid CBOR invalid"}
196196
>
197197
> In general, dCBOR applications need to avoid maps that have entries with keys that are semantically equivalent in dCBOR's numeric model.
198198

@@ -236,6 +236,8 @@ CDDL {{-CDDL}} is a widely used language for specifying CBOR data models. This s
236236

237237
The control operators `.dcbor` and `.dcborseq` are exactly like `.cbor` and `.cborseq` as defined in {{-CDDL}} except that they also require the encoded data item(s) to conform to dCBOR.
238238

239+
The CDDL Standard Prelude as defined in {{-CDDL}} Appendix D defines `number = int / float`. This type encompass the full space of CBOR numeric values representable by CBOR major types 0, 1, and 7. Therefore dCBOR applications can use `number` to specify fields with numeric values, and dCBOR's Numeric Reduction ensures that these values are encoded deterministically.
240+
239241
Tag 201 ({{tag201}}) is defined in this specification as a way to declare its tag content to conform to dCBOR at the data model level and the encoded data item level. (In conjunction with these semantics, tag 201 may also be employed as a boundary marker leading from an overall structure to specific application data items; see {{Section 3 of GordianEnvelope}} for an example for this usage.)
240242

241243
# Implementation Status
@@ -415,17 +417,17 @@ The numeric model of {{-CBOR}} provides three kinds of basic numeric types: unsi
415417
>
416418
> The tag content MUST be an unsigned or negative integer (major types 0 and 1) or a floating-point number (major type 7 with additional information 25, 26, or 27). Other contained types are invalid.
417419

418-
An inhabitant of Tag 1, as long as it represents an integral number of seconds since the epoch, could therefore be encoded as an integer *or* a floating point number. dCBOR's numeric reduction rule ensures that such values are always encoded as integers, eliminating variability in the encoding of such values.
420+
An inhabitant of Tag 1, as long as it represents an integral number of seconds since the epoch, could therefore be encoded as an integer *or* the equivalent floating point number. dCBOR's Numeric Reduction rule ensures that such values are always encoded as integers, eliminating variability in the encoding of such values.
419421

420-
But this raises a larger policy question for determinism: If two numeric values are semantically equal, should they be encoded identically? dCBOR answers "yes" to this question, and numeric reduction is the mechanism by which this is achieved. This choice answers the determinism question in a way that is simple to understand and implement, and that works well for the vast majority of applications. The serialization is still typed, but the burden of choosing types is reduced for protocol designers, who can simply specify numeric fields without worrying about the details of how those numbers will be encoded.
422+
But this raises a larger policy question for determinism: If two numeric values are semantically equal, should they be encoded identically? dCBOR answers "yes" to this question, and Numeric Reduction is the mechanism by which this is achieved. This choice answers the determinism question in a way that is simple to understand and implement, and that works well for the vast majority of applications. The serialization is still typed, but the burden of choosing types is reduced for protocol designers, who can simply specify numeric fields without worrying about the details of how those numbers will be encoded.
421423

422424
## Why Not `undefined`?
423425

424-
How to represent an absent value is a perennial question in data modeling. In general it is useful to have a value that represents a placeholder for a position where a value *could* be present but is not. This could be used in a map to indicate that a key is bound but has no value, or in an array to indicate that a value at a particular index is absent. There are other sorts of absence as well, such as the absence of a key in a map. dCBOR cannot address all of these different notions of absence, but can and does address the lack of semantic clarity around the choice between `null` and `undefined` by choosing `null` as the sole representation of a placeholder for an absent value. `null` is widely used in data modeling, and has a clear and unambiguous meaning. In contrast, `undefined` is less commonly used, and its meaning can be ambiguous. By choosing `null`, dCBOR provides a single clear way to represent absent values, reducing variability.
426+
How to represent an absent value is a perennial question in data modeling. In general it is useful to have a value that represents a placeholder for a position where a value *could* be present but is not. This could be used in a map to indicate that a key is bound but has no value, or in an array to indicate that a value at a particular index is absent. There are other sorts of absence as well, such as the absence of a key in a map, or a function that returns no value (`void`). dCBOR cannot by narrowing CBOR address all of these different notions of absence, but can and does address the lack of semantic clarity around the choice between `null` and `undefined` by choosing `null` as the sole representation of a placeholder for an absent value. `null` is widely used in data modeling, and has a clear and unambiguous meaning. In contrast, `undefined` is less commonly used, and its meaning can be ambiguous. By choosing `null`, dCBOR provides a single clear way to represent absent values, reducing variability.
425427

426428
## Why only a single `NaN`?
427429

428-
How to represent the result of a computation like `1.0 / 0.0` is another perennial question in data modeling. The {{IEEE754}} floating point standard answers this question with the concept of "Not a Number" (`NaN`): a special value that represents an unrepresentable or undefined numerical result. However, the standard also specifies several bit fields within the `NaN` representation that can vary, including the sign bit, whether the `NaN` is "quiet" or "signaling", and a payload field. These formations are useful in certain computational contexts, but have no general meaning in data modeling.
430+
How to represent the result of a computation like `1.0 / 0.0` is another perennial question in data modeling. The {{IEEE754}} floating point standard answers this question with the concept of "Not a Number" (`NaN`): a special value that represents an unrepresentable or undefined numerical result. However, the standard also specifies several bit fields within the `NaN` representation that can vary, including the sign bit, whether the `NaN` is "quiet" or "signaling", and a payload field. These formations are useful in certain computational contexts, but have no generally-accepted meaning in data modeling.
429431

430432
The problem of `NaN` is complicated by the fact that IEEE 754 specifies that all `NaN` values compare as "not equal" to all other numeric values, including themselves. This means that comparing any two `NaN` values, including identical ones, will always yield "not equal". The deeper problem this raises is that if you want to know what data a `NaN` might carry in its payload, you have to go to extraordinary lengths to extract that information, since you cannot simply compare two `NaN` values to determine whether they are the same.
431433

@@ -447,7 +449,7 @@ Tags provide a useful "escape hatch" for applications that need to use data item
447449

448450
## Why not define an API?
449451

450-
Because dCBOR mandates strictness in both encoding and decoding, and because of mechanisms it introduces such as numeric reduction, the question arises as to whether this document should specify an API, or at least a set of best practices, for dCBOR codec APIs. The authors acknowledge that such guidance might be useful, but since the purpose of dCBOR is to provide a deterministic encoding format, and because APIs can vary widely between programming languages and environments, the authors have chosen to not widen the scope of this document. We direct the reader to the several existing dCBOR implementations for guidance on API design.
452+
Because dCBOR mandates strictness in both encoding and decoding, and because of mechanisms it introduces such as Numeric Reduction, the question arises as to whether this document should specify an API, or at least a set of best practices, for dCBOR codec APIs. The authors acknowledge that such guidance might be useful, but since the purpose of dCBOR is to provide a deterministic encoding format, and because APIs can vary widely between programming languages and environments, the authors have chosen to not widen the scope of this document. We direct the reader to the several existing dCBOR implementations for guidance on API design.
451453

452454
--- back
453455

0 commit comments

Comments
 (0)