Skip to content

Conversation

abrahamwolk
Copy link

@abrahamwolk abrahamwolk commented Jul 23, 2025

This pull request intends to fix a bug encountered in Phoebus that can be reproduced using the following steps:

  1. Create two OPIs, OPI A and OPI B, each with a Text Update widget connected to the same PV.
  2. Run OPI A.
  3. Block TCP connections to the PV.
  4. Wait for the Text Update widget in "OPI A" to display "disconnected". (The issue is not reproduced if one reloads the OPI instead of waiting.) Screenshot:
screenshot1
  1. Run OPI B.
  2. Now the Text Update widget in OPI B displays alarm border of the state "Invalid" with the value "NaN". This is incorrect, since instead alarm border of the state "disconnected" should be displayed. Screenshot (it's the second PV from the top that is relevant):
screenshot2

It seems that the "Invalid" state that is displayed comes from the constant org.epics.vtype.Alarm.DISCONNECTED, and this pull request changes the constant to have alarm severity AlarmSeverity.UNDEFINED instead of AlarmSeverity.INVALID.

Based on the following comment, this seems to be the intention of the code:
https://github.com/abrahamwolk/epicsCoreJava/blob/25d0346c92868d5d76aa0781a5e70f78ab55117c/epics-vtype/vtype/src/main/java/org/epics/vtype/Alarm.java#L158-L166

The EPICS documentation, on the other hand, does not mention an alarm severity UNDEFINED: https://docs.epics-controls.org/en/latest/process-database/EPICS_Process_Database_Concepts.html#alarm-specification. Since a disconnection is a fact about the client and not a status reported over the EPICS protocol, I think it makes sense for a disconnection to have alarm severity UNDEFINED where UNDEFINED means it is not defined in the EPICS specification.

Perhaps it would be a good idea to rename UNDEFINED to DISCONNECTED? UNDEFINED only seems to be used for this purpose in the code.

Changes required for Phoebus
While this change does fix the alarm border to show DISCONNECTED instead of INVALID, the TextUpdate widget now displays NaN instead of <PV-Name> when a PV is not connected, as well as more information in the tooltip:
screenshot3

I will create a separate pull request to https://github.com/ControlSystemStudio/phoebus to solve this. Update: I have created a pull request to the Phoebus project to solve this: ControlSystemStudio/phoebus#3474

…rm severity "Undefined" instead of "Invalid".
@kasemir
Copy link
Contributor

kasemir commented Jul 23, 2025

DISCONNECTED ... INVALID vs. UNDEFINED ... not defined in the EPICS specification.

EPICS records have a severity in their SEVR field that can be OK, MINOR, MAJOR, INVALID.
Channel access has the same severity options.
A status of DISCONNECTED goes with an INVALID severity, which indeed seems better than the other options OK, MINOR, MAJOR.

PVAccess has severities OK, MINOR, MAJOR, INVALID, UNDEFINED.
It retrospect, that was a stupid idea.
The IOC never creates an UNDEFINED state. When you have an INP link to a missing record, you get SEVR=INVALID and STAT=LINK. When I run 'pvmonitor' and then stop the IOC, it just shows '' without further detail, so not sure which severity should be associated with that state:

demo:ai1 2025-07-23 09:13:55.873  6 MINOR DEVICE HIGH 
demo:ai1 2025-07-23 09:13:56.873  7 MINOR DEVICE HIGH 
demo:ai1 <Disconnect>

The VType uses the same severity options as PVAccess, because that's the widest range of options.
Maybe we should have gone with the common set of options, just OK, MINOR, MAJOR, INVALID, because it's unclear when to use UNDEFINED.

This looks like a worthwhile attempt, "Right now we think disconnected should go with UNDEFINED", but whatever you do might be considered wrong the next time around we look at this.

@abrahamwolk
Copy link
Author

abrahamwolk commented Jul 23, 2025

The way I think about it is that the EPICS alarm severity indicates problems on the IOC. In the relevant case here, I think that INVALID indicates that the PV-value cannot be trusted for some reason. I think that in general, an EPICS alarm severity communicates additional information about the received value.

In contrast, the fact that Phoebus (or another client) cannot establish a connection is a fact about the client, not about a value received from the PV. For this reason, I think it makes sense that the status DISCONNECTED cannot be represented as an EPICS alarm severity. That said, it is still a status that the client must handle, and I do not think it is correct to generate the alarm severity INVALID for when a disconnection has occurred: INVALID indicates a problem on the IOC.

A better solution that to represent disconnection as an Alarm would probably be to use a sum-type of the form Either<ClientSideErrorCondition, VType> or similar to represent potentially received values instead of by VType. This way client-side errors could be represented by instances of ClientSideErrorCondition. However, this would probably be a non-trivial refactoring.

@kasemir, Do you have any alternative suggestions to fix the issue described in the description of the pull request?

@shroffk
Copy link
Contributor

shroffk commented Jul 23, 2025

I do think there is some value in trying to create a convention/standard around distinct meanings for INVALID and UNDEFINED.

While I agree that these terms are nebulous enough that no matter what there will always be someone who misinterprets the convention, I think the linking UNDEFINED to client side issues is fine by me.

A better solution that to represent disconnection as an Alarm would probably be to use a sum-type of the form Either<ClientSideErrorCondition, VType> or similar to represent potentially received values instead of by VType. This way client-side errors could be represented by instances of ClientSideErrorCondition. However, this would probably be a non-trivial refactoring.

This seems like a lot of change and a potential for further confusion.

@anjohnson
Copy link
Member

The term UNDEFINED could lead to some user confusion because the IOC does has an alarm status of Undefined (spelled UDF), which has historically appeared with a severity of INVALID but can now be configured separately for each record, so UDF/MINOR or even UDF/NO_ALARM are possible values from a connected PV.

I'm not making an argument either way, just providing a little more info.

@abrahamwolk
Copy link
Author

Looking at usages of the constant AlarmSeverity.UNDEFINED in the source code of Phoebus, it does seem that Phoebus uses the constant AlarmSeverity.UNDEFINED to represent different statuses, not just a PV being disconnected.

And, of course, I don't know how other EPICS clients relying on epicsCoreJava use the constant AlarmSeverity.UNDEFINED. Changing the constant Alarm.DISCONNECTED to have the alarm severity AlarmSeverity.UNDEFINED may have unintended consequences.

That said, the logic for displaying alarm borders around widgets in Phoebus assumes that AlarmSeverity.UNDEFINED means disconnected: https://github.com/ControlSystemStudio/phoebus/blob/73fd7a67780a30d308cb139a5bdb9c41c413f659/app/display/representation-javafx/src/main/java/org/csstudio/display/builder/representation/javafx/widgets/RegionBaseRepresentation.java#L122-L149

I have now updated the associated the pull request to Phoebus with a fix of the bug that doesn't depend on this pull request. This commit, in particular, fixes the observed issue: ControlSystemStudio/phoebus@b62cc94

Since a change to epicsCoreJava may have unintended consequences, I think it's better to fix the issue in Phoebus.

@abrahamwolk abrahamwolk reopened this Jul 29, 2025
@abrahamwolk
Copy link
Author

@shroffk suggested that I re-open the pull request (ControlSystemStudio/phoebus#3474 (comment)).

Should we document the meaning of UNDEFINED somehow? And perhaps make sure that Phoebus follows the intended meaning?

My own thinking is still that sum types of the form Either<ClientSideErrorCondition, VType> or similar (in other words, more general versions of Optional<VType>) are the "correct" way to represent the possibility of client-side (error-) statuses in addition to values/statuses received from the IOC (and which are described in the EPICS documentation). This would also allow for type-safe handling of the different possibilities that can occur. But this would, of course, be a breaking change of the API of epicsCoreJava, and may not be practically implementable.

@kasemir
Copy link
Contributor

kasemir commented Jul 29, 2025

You'll never be wrong when you document the meaning of UNDEFINED and check that implementations follow that meaning.

As for changing anything, I'd rather simplify.
Human performance guidelines as well as "The Alarm Management Handbook" by Hollifield & Habibi suggest that people can understand 3 alarm seventies.
OK, Minor, Major.
Fine, Worrisome, Disastrous.
Any additional alarm severity like not-quite-perfect or bad-but-not-terrible only complicates matters without adding practical value.
EPICS has been successful with these 3 alarm severities ... in a properly operating system.
For technical reasons, we might not always know what the current value and thus alarm level is.
A record was never processed, there's a network disconnect, the device failed to answer, there was a checksum error in the response, ... INVALID covers all these.
So we have 3 true alarm severities OK, Minor, Major, and INVALID added for technical reasons in case we don't know the OK/Min/Maj state.

Splitting out UNDEFINED as a designated severity was a bad idea.
Undefined, disconnected, checksum error, timeout, discrepancy, .. can all be indicated via the status message with a common severity of INVALID.

==> I would suggest simplifying the VType severity to OK, Minor, Major, Invalid. Perfect match to the original EPICS record and Channel Access severities. Yes, PVAccess might report "UNDEFINED" in its severity, so we map that to INVALID as a VType severity.

@abrahamwolk
Copy link
Author

abrahamwolk commented Aug 1, 2025

You'll never be wrong when you document the meaning of UNDEFINED and check that implementations follow that meaning.

I agree that it would be good to document the meaning of UNDEFINED. However, I think the meaning is not clear. Step 1 would be to clarify the meaning, I think.

As for changing anything, I'd rather simplify.
Human performance guidelines as well as "The Alarm Management Handbook" by Hollifield & Habibi suggest that people can understand 3 alarm seventies.

I agree that this makes sense from an UI perspective, since it prevents confusion and increases situational awareness. However, that doesn't prevent the underlying implementation to distinguish between more states in order to handle them in the appropriate way for each case.

OK, Minor, Major.
Fine, Worrisome, Disastrous.
Any additional alarm severity like not-quite-perfect or bad-but-not-terrible only complicates matters without adding practical value.
EPICS has been successful with these 3 alarm severities ... in a properly operating system.

Phoebus displays five severities right now: OK, Minor, Major, Invalid, and Disconnected. I originally created this pull request because a widget could display "Invalid" when in fact "Disconnected" should be displayed.

For technical reasons, we might not always know what the current value and thus alarm level is.
A record was never processed, there's a network disconnect, the device failed to answer, there was a checksum error in the response, ... INVALID covers all these.

I don't think Invalid covers a network disconnect. According to [1], Invalid means that the IOC is reporting that there is a problem with the data.

[1] https://docs.epics-controls.org/en/latest/process-database/EPICS_Process_Database_Concepts.html#alarm-specification

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants