You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: draft-mcnally-deterministic-cbor.md
+11-9Lines changed: 11 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -165,7 +165,7 @@ dCBOR decoders:
165
165
166
166
## Numeric Reduction
167
167
168
-
The purpose of determinism is to ensure that semantically equivalent data items are encoded into identical byte streams. Numeric reduction ensures that semantically equal numeric values (e.g. `2` and `2.0`) are encoded into identical byte streams (e.g. `0x02`) by encoding "Integral floating point values" (floating point values with a zero fractional part) as integers when possible.
168
+
The purpose of determinism is to ensure that semantically equivalent data items are encoded into identical byte streams. Numeric Reduction ensures that semantically equal numeric values (e.g. `2` and `2.0`) are encoded into identical byte streams (e.g. `0x02`) by encoding "Integral floating point values" (floating point values with a zero fractional part) as integers when possible.
169
169
170
170
dCBOR implementations that support floating point numbers:
171
171
@@ -174,7 +174,7 @@ dCBOR implementations that support floating point numbers:
174
174
This also means that the three representations of a zero number in CBOR (`0`, `0.0`, `-0.0` in diagnostic notation) are all reduced to the basic integer `0` (with preferred encoding `0x00`).
175
175
176
176
{:aside}
177
-
> Note that numeric reduction means that some maps that are valid CBOR cannot be reduced to valid dCBOR maps, as numeric reduction can result in multiple entries with the same keys ("duplicate keys"). For example, the following is a valid CBOR map:
177
+
> Note that Numeric Reduction means that some maps that are valid CBOR cannot be reduced to valid dCBOR maps, as Numeric Reduction can result in multiple entries with the same keys ("duplicate keys"). For example, the following is a valid CBOR map:
178
178
>
179
179
> ~~~ cbor-diag
180
180
> {
@@ -184,15 +184,15 @@ dCBOR implementations that support floating point numbers:
184
184
> ~~~
185
185
> {: title="Valid CBOR data item with numeric map keys"}
186
186
>
187
-
> Applying numeric reduction to this map would yield the invalid map:
187
+
> Applying Numeric Reduction to this map would yield the invalid map:
188
188
>
189
189
> ~~~ cbor-diag
190
190
> { / invalid: multiple entries with the same key /
> In general, dCBOR applications need to avoid maps that have entries with keys that are semantically equivalent in dCBOR's numeric model.
198
198
@@ -236,6 +236,8 @@ CDDL {{-CDDL}} is a widely used language for specifying CBOR data models. This s
236
236
237
237
The control operators `.dcbor` and `.dcborseq` are exactly like `.cbor` and `.cborseq` as defined in {{-CDDL}} except that they also require the encoded data item(s) to conform to dCBOR.
238
238
239
+
The CDDL Standard Prelude as defined in {{-CDDL}} Appendix D defines `number = int / float`. This type encompass the full space of CBOR numeric values representable by CBOR major types 0, 1, and 7. Therefore dCBOR applications can use `number` to specify fields with numeric values, and dCBOR's Numeric Reduction ensures that these values are encoded deterministically.
240
+
239
241
Tag 201 ({{tag201}}) is defined in this specification as a way to declare its tag content to conform to dCBOR at the data model level and the encoded data item level. (In conjunction with these semantics, tag 201 may also be employed as a boundary marker leading from an overall structure to specific application data items; see {{Section 3 of GordianEnvelope}} for an example for this usage.)
240
242
241
243
# Implementation Status
@@ -415,17 +417,17 @@ The numeric model of {{-CBOR}} provides three kinds of basic numeric types: unsi
415
417
>
416
418
> The tag content MUST be an unsigned or negative integer (major types 0 and 1) or a floating-point number (major type 7 with additional information 25, 26, or 27). Other contained types are invalid.
417
419
418
-
An inhabitant of Tag 1, as long as it represents an integral number of seconds since the epoch, could therefore be encoded as an integer *or* a floating point number. dCBOR's numeric reduction rule ensures that such values are always encoded as integers, eliminating variability in the encoding of such values.
420
+
An inhabitant of Tag 1, as long as it represents an integral number of seconds since the epoch, could therefore be encoded as an integer *or* the equivalent floating point number. dCBOR's Numeric Reduction rule ensures that such values are always encoded as integers, eliminating variability in the encoding of such values.
419
421
420
-
But this raises a larger policy question for determinism: If two numeric values are semantically equal, should they be encoded identically? dCBOR answers "yes" to this question, and numeric reduction is the mechanism by which this is achieved. This choice answers the determinism question in a way that is simple to understand and implement, and that works well for the vast majority of applications. The serialization is still typed, but the burden of choosing types is reduced for protocol designers, who can simply specify numeric fields without worrying about the details of how those numbers will be encoded.
422
+
But this raises a larger policy question for determinism: If two numeric values are semantically equal, should they be encoded identically? dCBOR answers "yes" to this question, and Numeric Reduction is the mechanism by which this is achieved. This choice answers the determinism question in a way that is simple to understand and implement, and that works well for the vast majority of applications. The serialization is still typed, but the burden of choosing types is reduced for protocol designers, who can simply specify numeric fields without worrying about the details of how those numbers will be encoded.
421
423
422
424
## Why Not `undefined`?
423
425
424
-
How to represent an absent value is a perennial question in data modeling. In general it is useful to have a value that represents a placeholder for a position where a value *could* be present but is not. This could be used in a map to indicate that a key is bound but has no value, or in an array to indicate that a value at a particular index is absent. There are other sorts of absence as well, such as the absence of a key in a map. dCBOR cannot address all of these different notions of absence, but can and does address the lack of semantic clarity around the choice between `null` and `undefined` by choosing `null` as the sole representation of a placeholder for an absent value. `null` is widely used in data modeling, and has a clear and unambiguous meaning. In contrast, `undefined` is less commonly used, and its meaning can be ambiguous. By choosing `null`, dCBOR provides a single clear way to represent absent values, reducing variability.
426
+
How to represent an absent value is a perennial question in data modeling. In general it is useful to have a value that represents a placeholder for a position where a value *could* be present but is not. This could be used in a map to indicate that a key is bound but has no value, or in an array to indicate that a value at a particular index is absent. There are other sorts of absence as well, such as the absence of a key in a map, or a function that returns no value (`void`). dCBOR cannot by narrowing CBOR address all of these different notions of absence, but can and does address the lack of semantic clarity around the choice between `null` and `undefined` by choosing `null` as the sole representation of a placeholder for an absent value. `null` is widely used in data modeling, and has a clear and unambiguous meaning. In contrast, `undefined` is less commonly used, and its meaning can be ambiguous. By choosing `null`, dCBOR provides a single clear way to represent absent values, reducing variability.
425
427
426
428
## Why only a single `NaN`?
427
429
428
-
How to represent the result of a computation like `1.0 / 0.0` is another perennial question in data modeling. The {{IEEE754}} floating point standard answers this question with the concept of "Not a Number" (`NaN`): a special value that represents an unrepresentable or undefined numerical result. However, the standard also specifies several bit fields within the `NaN` representation that can vary, including the sign bit, whether the `NaN` is "quiet" or "signaling", and a payload field. These formations are useful in certain computational contexts, but have no general meaning in data modeling.
430
+
How to represent the result of a computation like `1.0 / 0.0` is another perennial question in data modeling. The {{IEEE754}} floating point standard answers this question with the concept of "Not a Number" (`NaN`): a special value that represents an unrepresentable or undefined numerical result. However, the standard also specifies several bit fields within the `NaN` representation that can vary, including the sign bit, whether the `NaN` is "quiet" or "signaling", and a payload field. These formations are useful in certain computational contexts, but have no generally-accepted meaning in data modeling.
429
431
430
432
The problem of `NaN` is complicated by the fact that IEEE 754 specifies that all `NaN` values compare as "not equal" to all other numeric values, including themselves. This means that comparing any two `NaN` values, including identical ones, will always yield "not equal". The deeper problem this raises is that if you want to know what data a `NaN` might carry in its payload, you have to go to extraordinary lengths to extract that information, since you cannot simply compare two `NaN` values to determine whether they are the same.
431
433
@@ -447,7 +449,7 @@ Tags provide a useful "escape hatch" for applications that need to use data item
447
449
448
450
## Why not define an API?
449
451
450
-
Because dCBOR mandates strictness in both encoding and decoding, and because of mechanisms it introduces such as numeric reduction, the question arises as to whether this document should specify an API, or at least a set of best practices, for dCBOR codec APIs. The authors acknowledge that such guidance might be useful, but since the purpose of dCBOR is to provide a deterministic encoding format, and because APIs can vary widely between programming languages and environments, the authors have chosen to not widen the scope of this document. We direct the reader to the several existing dCBOR implementations for guidance on API design.
452
+
Because dCBOR mandates strictness in both encoding and decoding, and because of mechanisms it introduces such as Numeric Reduction, the question arises as to whether this document should specify an API, or at least a set of best practices, for dCBOR codec APIs. The authors acknowledge that such guidance might be useful, but since the purpose of dCBOR is to provide a deterministic encoding format, and because APIs can vary widely between programming languages and environments, the authors have chosen to not widen the scope of this document. We direct the reader to the several existing dCBOR implementations for guidance on API design.
0 commit comments