Skip to content

Decision log plugin: don't drop events when reaching buffer limit, slow down and retry #7454

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
sspaink opened this issue Mar 18, 2025 · 1 comment

Comments

@sspaink
Copy link
Contributor

sspaink commented Mar 18, 2025

What is the underlying problem you're trying to solve?

The decision log plugin manages a buffer. By default this buffer has an "unlimited" size, so valid logged events will only be dropped if OPA crashes due to running out of memory (worst case scenario). To prevent a OOM crash the user can configure a buffer size limit with buffer_size_limit_bytes. Then if the buffer fills up the oldest events will be dropped to make room for new incoming events (works as a circular buffer). The problem with dropping events when the limit is reached is it sacrifices audibility for latency. This causes an issue in scenarios where every event is critical and should never be dropped, preferring to slow down then lose data.

In an upcoming PR a new buffer type will be introduced that doesn't support an "unlimited" size where it also defaults to dropping events when the limit is reached. Both this new and the current buffer could benefit with a way to keep all incoming events.

Describe the ideal solution

A new configuration option to change the behavior to never drop an incoming event but to instead keep retrying until there is room in the buffer. Maybe also a configuration option to change how long to retry before giving up.

Ideally there would also be a way to communicate back to the log producer with an error code ( for example: HTTP 429 Too Many Requests) so that the producer can either slow down or start sending events to a different OPA instance.

Additional Context

#5724

Copy link

stale bot commented Apr 17, 2025

This issue has been automatically marked as inactive because it has not had any activity in the last 30 days. Although currently inactive, the issue could still be considered and actively worked on in the future. More details about the use-case this issue attempts to address, the value provided by completing it or possible solutions to resolve it would help to prioritize the issue.

@stale stale bot added the inactive label Apr 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant