Skip to content

Conversation

yongxin927
Copy link

@yongxin927 yongxin927 commented Mar 17, 2025

…ks in some cases

Why I did it

In the current notification mechanism, the NotificationConsumer stores notification messages in bulk to an internal queue. However, when the various Orch consumers receive notifications, they only retrieve one notification from the queue for processing at a time. This approach has a potential risk of memory leaks. If the device is hit by a large number of FDB move or link up/down flapping events, a significant increase in memory consumption by the orchagent can be observed, eventually leading to a system crash or hang.
(Testing has been done on SONiC 202111, where the orchagent's memory consumption reached up to 10GB and continued to increase)

How I did it

Modified the internal queue of the NotificationConsumer class to replace std::queue. Now, when processing existing duplicate messages, the NotificationConsumer will no longer enqueue them and will instead move the existing message to the end of the queue. This significantly reduces the number of duplicate notify messages, and the time complexity for both lookup and insertion in the new queue is O(1).

How I verify it

  1. Performed 10,000,000 link flapping tests on SONiC 202111, and the virtual memory usage of orchagent was about 700M~900M.

  2. Performed 7000 fdb move per second on SONiC 202111, and the virtual memory usage of orchagent was about 397M.

swss also has an enhancement for this PR:
sonic-net/sonic-swss#3560

…ks in some cases

Why I did it

In the current notification mechanism, the NotificationConsumer stores notification messages in bulk to an internal queue.
However, when the various Orch consumers receive notifications, they only retrieve one notification from the queue for processing at a time.
This approach has a potential risk of memory leaks. If the device is hit by a large number of FDB move or link up/down flapping events,
a significant increase in memory consumption by the orchagent can be observed,
eventually leading to a system crash or hang.
(Testing has been done on SONiC 202111, where the orchagent's memory consumption reached up to 10GB and continued to increase)

How I did it

Modified the internal queue of the NotificationConsumer class to replace std::queue.
Now, when processing existing duplicate messages, the NotificationConsumer will no longer enqueue them and
will instead move the existing message to the end of the queue (as a penalty for DDOS).
This significantly reduces the number of duplicate notify messages, and the time complexity for both lookup and insertion in the new queue is O(1).
@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants