Skip to content

Conversation

@tshalvi
Copy link
Contributor

@tshalvi tshalvi commented Sep 30, 2025

Why I did it

Currently, if an EEPROM read or write attempt fails, it is not retried. To make EEPROM access more robust and reliable, this PR introduces a retry mechanism for both read and write operations.

Work item tracking
  • Microsoft ADO (number only):

How I did it

Added retry attempts for EEPROM read/write operations when the failure is specifically due to an I²C error (errno EIO). In such cases, the operation is retried up to 5 times with 100 ms intervals. For any other failure type, no additional attempts are made.
Introduced corresponding NOTICE logs to indicate retry attempts and successes.

How to verify it

Manual testing.

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211

Tested branch (Please provide the tested image version)

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

@tshalvi tshalvi requested a review from lguohan as a code owner September 30, 2025 15:28
@r12f
Copy link

r12f commented Oct 1, 2025

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@tshalvi
Copy link
Contributor Author

tshalvi commented Oct 5, 2025

@microsoft-github-policy-service agree company="NVIDIA"

@tshalvi tshalvi requested a review from prgeor October 6, 2025 11:45
except (OSError, IOError) as e:
return False, ctypes.get_errno()

logger.log_debug(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tshalvi can we move this debug log to the beginning so that we do'nt miss the log if there is error a line 674?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d prefer to keep the debug log after the write, since the goal is to record the actual completed write.
Moving it before the write would log an attempted write, even if it later fails, which could be misleading.

if utils.read_int_from_file(presence_sysfs) != 1:
return False
eeprom_raw = self._read_eeprom(0, 1, log_on_error=False)
eeprom_raw = self.read_eeprom(0, 1, log_on_error=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tshalvi why do we need to mix hardware presence (which is what get_presence() is meant to return) with i2c read ? What is the motivation behind?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The motivation is that get_presence() returning True (presence = 1) only indicates that the hardware module is physically detected. It doesn’t necessarily mean that the module’s EEPROM is ready for access. The expectation is to report present=True only when both the hardware is connected and the EEPROM is ready and accessible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prgeor , there are other threads in xcvrd and other process such as thermalctld which calls get_presence before querying EEPROM. So, if we just check the presence sysfs without checking the eeprom readness, it will cause error in other threads / processes/

@tshalvi tshalvi requested a review from prgeor October 8, 2025 10:30
@r12f
Copy link

r12f commented Oct 9, 2025

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@prgeor
Copy link
Contributor

prgeor commented Oct 18, 2025

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@r12f r12f merged commit 6ebf9d0 into Azure:202412 Oct 19, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants