Skip to content

Conversation

@tshalvi
Copy link
Contributor

@tshalvi tshalvi commented Sep 1, 2025

Why I did it

Currently, if an EEPROM read or write attempt fails, it is not retried. To make EEPROM access more robust and reliable, this PR introduces a retry mechanism for both read and write operations.

Work item tracking
  • Microsoft ADO (number only):

How I did it

Added retry attempts for EEPROM read/write operations when the failure is specifically due to an I²C error (errno EIO). In such cases, the operation is retried up to 50 times with 100 ms intervals. For any other failure type, no additional attempts are made.
Introduced corresponding NOTICE logs to indicate retry attempts and successes.

How to verify it

Manual testing.

Which release branch to backport (provide reason below if selected)

  • 202205
  • 202211
  • 202305
  • 202311
  • 202405
  • 202411
  • 202505

Tested branch (Please provide the tested image version)

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

@tshalvi tshalvi requested a review from lguohan as a code owner September 1, 2025 19:35
@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

liat-grozovik
liat-grozovik previously approved these changes Sep 2, 2025
@liat-grozovik
Copy link
Collaborator

@prgeor FYI.

CMIS_MCI_EEPROM_OFFSET = 2
CMIS_MCI_MASK = 0b00001100

MAX_ATTEMPTS = 50
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tshalvi 50 retries for an i2c r/w seems too much. won't this mask the real hw/driver issue from being debugged? A reasonable retry number would be 2 or 3

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 50 retry attempts, each spaced 100 ms apart. Once one succeeds, the process stops and no redundant retries are performed- Could you please clarify what’s the concern here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tshalvi worst case retry for one i2c transaction could be 5 * 100msec = 5 secs. This retry will simply mask the actual issue. Thats why I said we should investigate any retries more than 3.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tshalvi What's the motivation with 50 retries? vs 100 retires vs 1000 retires? How did you arrive at the upper bound of 50?

@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@dgsudharsan dgsudharsan requested a review from prgeor September 23, 2025 04:02
@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

@prgeor prgeor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tshalvi @moshemos this is MASTER PR, as discussed, please raise separate PR for 202412 branch.

@r12f
Copy link
Contributor

r12f commented Sep 29, 2025

hi @prgeor , should we wait for master PR to be merged before merging the 202412 one?

@prgeor
Copy link
Contributor

prgeor commented Oct 1, 2025

hi @prgeor , should we wait for master PR to be merged before merging the 202412 one?

@r12f No. 202412 PR is separate. Master needs proper fix in the platform code.

@r12f
Copy link
Contributor

r12f commented Oct 19, 2025

202412 is approved by PrinceG hence merged now: Azure/sonic-buildimage-msft#1679.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants