Skip to content

Conversation

@lukaspie
Copy link
Contributor

@lukaspie lukaspie commented Jan 16, 2025

NXdata contains two renameable fields, AXISNAME and DATA, that are both of type NX_NUMBER. This naturally presents a problem in that any instance of any of the two must be properly allocated to either of the two concepts.

On the data level, this is not a problem since this can be handled by properly assigning the @signal and @axes attributes and thus it is immediately clear which data field belongs to which concept (see here).

The problem is however not solved on the conceptual, data modelling level, i.e., when writing base classes and application definitions. Consider the following situation (borrowed directly from contributed_definitions/NXiv_temp):

<definition xmlns="http://definition.nexusformat.org/nxdl/3.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" category="application" name="NXiv_temp" extends="NXsensor_scan" type="group" xsi:schemaLocation="http://definition.nexusformat.org/nxdl/3.1 ../nxdl.xsd">
    <doc>
         Application definition for temperature-dependent IV curve measurements.
    </doc>
    <group type="NXentry">
        <group type="NXdata">
            <doc>
                 This NXdata contains multiple fields. Temperature and voltage are supposed to specialize `AXISNAME`, whereas 
                 `current` is specializing `DATA`.
            </doc>
            <field name="temperature" type="NX_NUMBER"/>
            <field name="voltage" type="NX_NUMBER"/>
            <field name="current" type="NX_NUMBER"/>
        </group>
    </group>
</definition>

Here, we as humans know which field in /NXiv_temp/NXentry/NXdata is associated with which field in NXdata. However, there is no way for any interpretation tool to automatically understand this. Note that currently the way that each conceptual field is assigned to AXISNAME or DATA is by doing namefitting, i.e., by comparing whether the name of the field more closely resembles "axisname" or "data". For this particular example, this would lead to an assignment (on the conceptual level) of /NXiv_temp/NXentry/NXdata/current to AXISNAME, i.e. temperature would be specializing NXdata/AXISNAME. See the pop-up link on the bottom of this screenshot, here we actually visualize which concept current is inheriting from.

image

During file writing with actual data, we will assign current to DATA by adding it to the signal. However, any proper data management system will complain here because what we define on the conceptual level will not be the same as what the data provider will injest.

So what I think is needed is a way to already define on the data modelling level which field is an axis and which one is a data field.

As a fix, we add an extra attribute extends to fieldType where one can explicitly say which field in the inheritance shall be extended. The language is specifically chosen to indicate that this is only possible for the use case outlined above and not for any arbitrary case (to prevent people from just extending any arbitrary field defined somewhere else).

EDIT: added a rendering in the docs
image
This should probably be changed to using the proper link instead.

@lukaspie lukaspie force-pushed the extends-for-fields branch 4 times, most recently from f1f7480 to 3849cbe Compare January 16, 2025 15:37
@lukaspie lukaspie force-pushed the extends-for-fields branch 2 times, most recently from 0c9f76d to 52d06be Compare February 12, 2025 12:19
@lukaspie lukaspie marked this pull request as ready for review February 12, 2025 15:31
@lukaspie lukaspie added the NIAC vote needed PR needs an approving vote from NIAC before merge label Feb 12, 2025
@rayosborn
Copy link
Contributor

rayosborn commented Feb 12, 2025

In the example you cite, wouldn't it be much simpler to explicitly add the signal and axes attributes to the group, rather than adding extra attributes?

<definition xmlns="http://definition.nexusformat.org/nxdl/3.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" category="application" name="NXiv_temp" extends="NXsensor_scan" type="group" xsi:schemaLocation="http://definition.nexusformat.org/nxdl/3.1 ../nxdl.xsd">
    <doc>
         Application definition for temperature-dependent IV curve measurements.
    </doc>
    <group type="NXentry">
        <group type="NXdata">
            <doc>
                 This NXdata contains multiple fields. Temperature and voltage are supposed to specialize `AXISNAME`, whereas 
                 `current` is specializing `DATA`.
            </doc>
            <attribute name="signal" type="NX_CHAR" >
                <enumeration>
                    <item value="current">
                </enumeration>
             </attribute>
            <attribute name="axes" type="NX_CHAR" >
                <enumeration>
                    <item value="[temperature, current]">
                </enumeration>
             </attribute>
             <field name="temperature" type="NX_NUMBER"/>
            <field name="voltage" type="NX_NUMBER"/>
            <field name="current" type="NX_NUMBER"/>
        </group>
    </group>
</definition>

I confess I'm not sure exactly how to specify a list of axes in this instance, but I hope someone with more experience of NXDL syntax can fix my example.

This would have the added advantage of specifying which dimensions the axes apply to.

@lukaspie
Copy link
Contributor Author

lukaspie commented Feb 12, 2025

In the example you cite, wouldn't it be much simpler to explicitly add the signal and axes attributes to the group, rather than adding extra attributes.

For this particular example, this would work, yes. The problem is that there is no mechanism that enforces anybody to write the axes and signal in the application definition. For example, in the NXstxm application definition, there is an energy field, but no axes and signal defined.

However, an application definition can define many possible axes and you may not want to enforce a priori what the axes attribute shall contain. For example, for photoemission spectroscopy, we define, among others, the following axes (some of which are recommended/optional): energy, kx, ky, kz, angular0, angular1, spatial0, spatial1. Depending on the experiment that is actually run, the axes will change:

  • For simple XPS, you will only need the energy axis.
  • For angle-resolved measurements, you will also have either all the k* or angular* axes.

So you cannot define the axes in the application definition because this will be decided on the data level, not on the conceptual level. Still, you want to say on the conceptual level that each of these fields extends NXdata/AXISNAME and not NXdata/DATA.

Just for explanation: the problem arises for us in the context of the research data management system we are building (NOMAD). The idea there is that the schema comes first. That is, you read in the NXDL definitions and already assign each of the NeXus groups/fields/attributes to a unique concept, which also includes all the inheritance chain and all sub-elements. There is absolutely no way to say if the energy field in NXstxm is inheriting from DATA or AXISNAME. Once data is read in, we want to assign the actual data instance to the pre-defined concepts, but since we don't know what energy is, we cannot do this assignment properly and have to resort to namefitting.

I believe this is however not only relevant for us. Any ontological tool would need to understand what each concept is and the relationships between different concepts just from the data modeling, not from instance data. So this will come up with any tool trying to understand the NXDL definitions.

@rayosborn
Copy link
Contributor

For this particular example, this would work, yes. The problem is that there is no mechanism that enforces anybody to write the axes and signal in the application definition. For example, in the NXstxm application definition, there is an energy field, but no axes and signal defined.

There is presumably no mechanism for enforcing the use of the new extends attribute either.

However, an application definition can define many possible axes and you may not want to enforce a priori what the axes attribute shall contain. For example, for photoemission spectroscopy, we define, among others, the following axes (some of which are recommended/optional): energy, kx, ky, kz, angular0, angular1, spatial0, spatial1. Depending on the experiment that is actually run, the axes will change:

In principle, this could also be handled by the axes attribute because the additional names could be added to the enumeration. However, in my own example, I wasn't sure how to handle a list of attributes, and it might be impossible when there are multiple permutations. Even so, it might still be worth recommending the use of the axes attribute to define the default, or preferred, axes.

Just for explanation: the problem arises for us in the context of the research data management system we are building (NOMAD).

I believe this is however not only relevant for us. Any ontological tool would need to understand what each concept is and the relationships between different concepts just from the data modeling, not from instance data. So this will come up with any tool trying to understand the NXDL definitions.

I am not objecting in principle to this PR, but I want to make sure it's really necessary. From my own point of view, it will add non-trivial complexity to parsing application definitions in the NXValidate package, in ways that I will have to think about.

Finally, I presume this is only meant for application definitions, since that is the only place where placeholder names are made explicit.

@sanbrock
Copy link
Contributor

The extension and clarification of nameType removed inconsistencies and ambiguities from the standard, but could not solve the inherent problem in NXdata that does not follow the formal rules of NXDL.
It defines 2 fields, DATA and AXISNAME, both with nameType=any which immediately results in ambiguity. There is no way to tell from the name of a field if it belongs to any of these Fields defined in NXdata. Even more, NXdata also speaks about another Field, FIELDNAME that is not even defined explicitly in the NXdata.nxdl.xml file as a Field, but only indirectly by the actually defined Field FIELDNAME_errors.

We could have declared a Field FIELDNAME (which automatically implies the corresponding Field FIELDNAME_errors). Similarly, DATA and AXISNAME could have been declared as a specialisation of FIELDNAME, but ambiguity should have been avoided, so they should have been named differently, e.g. dataFIELD, and axisFIELD.

Instead, a special convention has (actually a series of conventions have) been introduced. These conventions connected to @axes, @signal, etc. solve the problem for data files, and resolves the ambiguity when a data file is interpreted. The provided attributes make it clear what is a DATA and what is an AXISNAME.

The problem is that it only works for interpreting data files where the attributes are provided, but it does not work when we make new definitions where we want to declare new Fields, but it will not be possible to declare which Field it is specialising. It is important to allow its clear declaration.

We are proposing a new attribute for this purpose, but @rayosborn you are right, since we now introduced open enumerations, an alternative could be the use open enumerations for @axes, @signal (and auxiliary_signals?), and list the names of inheriting Fields under these attributes.
(Note that closed enumerations would not necessarily work as they would limit the possibilities for the field names in the data file.)

Hence, I can imagine an alternative proposal where we introduce a convention of using @axes, and @signal (and not @auxiliary_signals which would complicate the situation even further) for listing the respective inheriting Field names.
The main difference in the 2 alternatives:
@extends: references backwards
@signal: references forward
Note that a declared @signal with its potentially declared open enumeration list is inherited to subclasses. Hence, a declaration of @signal enumerations in a subclass, must be also considered even if the current class does not declare enumerations for @signal. This suggests that the resolution of a forward reference can be more difficult.

@lukaspie
Copy link
Contributor Author

Just to be clear, I am not set on having this new extends attribute, but I would like to have some resolution for the probem described above.

An open enumeration could potentially also work, but this is different to how open enumerations are used so far. So far, if you were to define a list like ["energy", "kx", "ky", "kz", "angular0", "angular1"] to the open enumeration of @axes and then, in the NeXus file, you are using @axes=["energy", "kx", "ky", "kz"], you are not making use of the items in the original list. Rather you are just using a new list, which is not covered by the initial enum. There would need to be an additional mechanism where you can define all the possible items a list is allowed to have and a subset of that would be valid as well.

Finally, I presume this is only meant for application definitions, since that is the only place where placeholder names are made explicit.

Technically, one could also use NXdata in another base class and give explicit names for the inherited DATA and AXISNAME concepts. This doesn't exist anywhere (I think), but technically, it is possible.

@lukaspie lukaspie linked an issue Mar 12, 2025 that may be closed by this pull request
<field name="temperature" type="NX_NUMBER"/>
<field name="voltage" type="NX_NUMBER"/>
<field name="current" type="NX_NUMBER">
<field name="temperature" type="NX_NUMBER" extends="/NXdata/AXISNAME"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative would be to specify @temperature_indices and @voltage_indices to signify their corresponding fields are AXISNAME instances

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@signal="current" may be too restrictive for an app def!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative would be to specify @temperature_indices and @voltage_indices to signify their corresponding fields are AXISNAME instances

That works for AXISNAME, but how about DATA? Would be nice to have a consistent way of assigning fields to one or the other.

@signal="current" may be too restrictive for an app def!

Exactly my point 👍

@lukaspie
Copy link
Contributor Author

I was asked to give a use case as to why this new attribute (or, as I said above, any other solution to distinguish the inheritance of fields in NXdata) is needed.

Just to be clear, here is again the problem that we have: given that the two all-uppercase terms AXISNAME and DATA are both of type NX_NUMBER, if you have a field (say, called NXmpes/NXdata/energy) in a specialized NXdata group that is also of type NX_NUMBER, it is impossible to tell on the data modelling level (again, when no data is present) whether that energy field shall be a specialization of the AXISNAME or the DATA field and from which of the two it shall inherit all its properties (incl. the different semantic meaning as well as the different attributes).

Now, why is that important. Both in the NeXusOntology as well as any other tool using the NeXus definitions to build a data schema, we interpret the NeXus definitions before there is any data in the system. That is, we build a whole tree of all NeXus concepts that are defined in the application definitions and base classes, including all their inheritance. For the example above, the energy field is attached to the group NXmpes/NXdata. This group is a specialization of the base class NXdata. What we want to know is the complete inheritance of the energy field, i.e., which field in NXdata is it specializing.

In our example, in the RDM system NOMAD, we actually build the whole NeXus tree upon initialization of the system to have a consistent data schema in the system (we call this schema the NOMAD metainfo). Here's a screenshot:
image

With the current way the NeXus definitions are written, it is impossible to understand at this point (again, without having any data yet) whether NXmpes/NXdata/energy shall inherit (i.e., be a specialization of) from AXISNAME and DATA. When we later read in NeXus files and find the energy field in an NXmpes entry, we can of course infer that it is an AXISNAME if it is stored in the @axes attribute. But, at that point, we must know if energy can even be an AXISNAME or if it is always meant to be used as a DATA instead.

Therefore, we want to be able to know this on the data modelling level, i.e., just from reading the XML files. Note that this is really the only case where such ambiguity exists in NeXus because nowhere else we have two fields with nameType="any" on the same level.

The way I see it, there are multiple possible solutions to this problem:

  1. Use a new attribute to the fields in NXdata groups (as suggested in this PR).
  2. Always explicitly require that if a named field is supposed to be an AXISNAME, you need to also name the attribute AXISNAME_indices in the application definition (this was the suggestion from @PeterC-DLS). Note that with this suggestion, it is still impossible to say that a named field is supposed to inherit from DATA instead.
  3. Any other solution based on currently implemented possibilities, like adding the @axis attribute to that field in the XML file, to indicate that this is supposed to be an instance of AXISNAME.

One thing that doesn't work IMO: on the application definition level, enumerate the @signal and @axes attributes with all possible signals or axes. In the case of the application definition NXmpes, we have 10 possible axes (energy, kx, ky, kz, ...). We want to have there with their names (to control the vocabulary), but they will only sometimes be used, depending on the use case. So such an enumeration (as originally suggested by @rayosborn above) would also not work.

@lukaspie
Copy link
Contributor Author

lukaspie commented Mar 24, 2025

In any case, whatever the solution is, there will need to be quite some changes to exsting files that make use of named fields in NXdata groups. Here's a (hopefully) complete list of occurences of NXdata in applications/base classes that have explicitly named fields:

In addition, there are several new application definitions (NXapm, NXem, NXmpes, NXxps, etc.) currently under discussion that have explicitly named fields within an NXdata group.

@phyy-nx
Copy link
Contributor

phyy-nx commented Mar 28, 2025

Tricky problems here. Maybe instead of extends, it could be implements? And maybe it's on the same line instead of a separate line?

Instead of

image

The rendering could be

image

With a link to AXISNAME or DATA in NXdata?

@lukaspie lukaspie changed the title extends attribute to fields to clarify if a field in a specified NXdata is a AXISNAME or DATA extends attribute to clarify if a field in a specified NXdata is a AXISNAME or DATA Apr 1, 2025
@lukaspie
Copy link
Contributor Author

lukaspie commented Apr 1, 2025

Tricky problems here. Maybe instead of extends, it could be implements? And maybe it's on the same line instead of a separate line?

Instead of

image

The rendering could be

image

With a link to AXISNAME or DATA in NXdata?

Since extends already has a meaning in NeXus, I thought it would be good to use the same keyword. Such a named AXISNAME does exactly that: it extends the original definition of AXISNAME, in the same way that a class extends another class. That being said, I see that this may be confusing (this was also brought up in one of the recent Telcos), so maybe another name would be better. Any name for this attribute is fine for me, could be any of extends, implements, specializes. I like your implements sugestion.

Regarding the rendering, I agree that having it on the same line looks much cleaner. I like your suggestion and have implemented it here directly (see below). However, we already have the linking to the parent concept using the notation, see here for NXdata. If AXISNAME and DATA are correctly resolved, this would also work for energy and absorbed_beam, so maybe it is not even needed to provide a special rendering? Not sure.

image

@benajamin
Copy link
Contributor

If I'm understanding correctly, you are wanting to define the dependent (AXISNAME) and independent (DATA) variables from the application definition alone. You are trying to achieve this by adding a helper flag. This is not part of the core mission of NeXus (writing and reading data), although I'm still somewhat sympathetic to your issue. I'll need to think about this some more...

@phyy-nx
Copy link
Contributor

phyy-nx commented Aug 25, 2025

From Telco 08/25/25: this is actually a normalization problem. The issue is that AXISNAME has type NX_CHAR_OR_NUMBER and DATA has type NX_NUMBER, which leads to an ambiguity. If AXISNAME were a table in a database, properly indexed, and DATA were a table in a database, properly indexed, and NXdata had references to those two tables, then there is no ambiguity and the problem goes away. But this is a larger discussion in the whole context of NeXus.

In the meantime, we have a work around for this specific case in NXdata, where DATA fields are listed in @signal, and @auxiliarysignals, and AXISNAME fields are stored in either @axes or AXISNAME_indices. So in practice, there is no actual ambiguity (provided folks use these fields and attributes correctly).

In conclusion, there's more discussion here, and it's a workaroundable problem at present, so we punt this to the next release. Thanks for the discussion.

@phyy-nx phyy-nx modified the milestones: NXDL 2025, NXDL 2026 Aug 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting input NIAC vote needed PR needs an approving vote from NIAC before merge nxdl-syntax

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Clarify if a field in a specified NXdata is a AXISNAME or DATA

6 participants