[swss-common] Enhanced NotificationConsumer queue to avoid memory lea… #989
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
…ks in some cases
Why I did it
In the current notification mechanism, the NotificationConsumer stores notification messages in bulk to an internal queue. However, when the various Orch consumers receive notifications, they only retrieve one notification from the queue for processing at a time. This approach has a potential risk of memory leaks. If the device is hit by a large number of FDB move or link up/down flapping events, a significant increase in memory consumption by the orchagent can be observed, eventually leading to a system crash or hang.
(Testing has been done on SONiC 202111, where the orchagent's memory consumption reached up to 10GB and continued to increase)
How I did it
Modified the internal queue of the NotificationConsumer class to replace std::queue. Now, when processing existing duplicate messages, the NotificationConsumer will no longer enqueue them and will instead move the existing message to the end of the queue. This significantly reduces the number of duplicate notify messages, and the time complexity for both lookup and insertion in the new queue is O(1).
How I verify it
Performed 10,000,000 link flapping tests on SONiC 202111, and the virtual memory usage of orchagent was about 700M~900M.
Performed 7000 fdb move per second on SONiC 202111, and the virtual memory usage of orchagent was about 397M.
swss also has an enhancement for this PR:
sonic-net/sonic-swss#3560