Description
Added text 2024-11-24
I think we need to be careful about our usage of the terms 'well-formed' and 'valid'. The following is not fully fleshed out; it is more of a discussion of the issue and some ideas for the future.
We often reference other sources for identifiers, and want them to be interpreted according to that source. Sources that change over time should (and typically do) distinguish between well-formed and valid. For example, 'ge:manic' is not a well-formed locale identifier, and 'de-Flub' is not a valid locale identifier. However, 'de-Flub' could (conceivably) become valid in the future, if a script is given the code 'Flub'. Good sources also never remove identifiers, or make material changes in the meaning, but may deprecate them: those are still treated as valid.
When we reference such sources in message format, such as with option values, we have a few goals.
- Ideally, implementations could only accept well-formed and valid identifiers, and only interpret them according to the source semantics. For example, interpret 'de' as German and not as Dezfuli.
- However, we don't want to force implementations to break if they don't support all the identifiers, nor if they don't support the latest version, or if they support an identifier that has become deprecated.
This is also true for our own enums, . We have in registry.md:
Implementations MAY accept additional option values for options defined here. However, such values might become defined with a different meaning in the future, including with a different, incompatible name or using an incompatible value space. Supporting implementation-specific option values for standard or optional functions is NOT RECOMMENDED.
We also have BNF:
option = identifier o "=" o (literal / variable)
The implications are that conformant implementation can interpret any of:
{$x :currency compactDisplay=short}
{$x :currency compactDisplay=medium}
{$x :currency compactDisplay=μικρός}
{$x :currency compactDisplay=|🐭|}
{$x :currency compactDisplay=$myDisplay}
It can also interpret:
{$x :currency currency=CAD}
{$x :currency currency=MyCurrency}
{$x :currency currency=δολάριοΚαναδά}
{$x :currency currency=|¥|}
{$x :currency currency=|🐭|}
{$x :currency currency=$myCurrency}
It could also interpret compactDisplay=short by formatting a long form, and compactDisplay=long by formatting a short form. Or a value of CAD as being GBP, etc.
This level of freedom seems counterproductive for interoperability.
So I propose that we have the general rule something like the following, where option values are defined according to a reference to an external source
- An implementation MUST ignore any option with a literal option value that is ill-formed according to its external source, and signal that error. This allows linters and message builders to catch ill-formed values early.
- [It must ignore the option
locale=|ge:manic|
]
- [It must ignore the option
- An implementation MUST ignore any option with an option value that isn't valid according to any version of the external source.
- [At the time of this writing, must ignore
locale=|dab|
]
- [At the time of this writing, must ignore
- An implementation SHOULD (but need not) ignore an option with an option value that is valid according to some version of the external source.
- [An implementation might not support Dezfuli, and thus ignore
locale=|def|
; it may also ignore all deprecated language identifiers, and thus ignorelocale=|daf|
.]
- [An implementation might not support Dezfuli, and thus ignore
- If an implementation doesn't ignore an option, then it MUST interpret its option value in accordance with some version of the source.
- [It must not interpret 'de' as Dezfuli, or 'def' as German.]
Ignore means that the expression is interpreted as if the option were not there. (I won't talk here about what signals to the caller are associated with that.)
I think we could apply that to our standard enum option values, such as the following in https://github.com/unicode-org/message-format-wg/blob/main/spec/registry.md#options-1, so that |@!$| could be recognized as ill-formed.
- useGrouping
- auto (default)
- always
- never
- min2
That is, perhaps we can have a rule in the registry for our functions, something like: the default well-formedness criteria for standard function option values matches the constraints on function option identifiers in README.md. Thus |$abc| would be ill-formed for useGrouping. Any function option that had different criteria for well-formedness of its values would simply have have an explicit well-formedness statement.