-
Notifications
You must be signed in to change notification settings - Fork 1.5k
[AI-6101] Add support for customizable cache keys for persistent cache #21316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The following files, which will be shipped with the agent, were modified in this PR and You can ignore this if you are sure the changes in this PR do not require QA. Otherwise, consider removing the label. List of modified files that will be shipped with the agent
|
Codecov Report❌ Patch coverage is Additional details and impacted files🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job trying to satisfy customer needs! I left a few comments. Thanks for pinging me on this one 🙏
Overall, I'm not convinced that the concept of cache key types is warranted. Rather, you could take a much simpler approach by defining a method that returns the expected persistent cache key prefix (defaulting to the check ID) which could be overridden. The return value would then be cached via a check initialization scheme or otherwise, and that static prefix would be used ubiquitously.
datadog_checks_base/datadog_checks/base/utils/cache_key/manager.py
Outdated
Show resolved
Hide resolved
datadog_checks_base/datadog_checks/base/utils/cache_key/__init__.py
Outdated
Show resolved
Hide resolved
datadog_checks_base/datadog_checks/base/utils/cache_key/config_set.py
Outdated
Show resolved
Hide resolved
datadog_checks_base/datadog_checks/base/utils/cache_key/config_set.py
Outdated
Show resolved
Hide resolved
datadog_checks_base/datadog_checks/base/utils/cache_key/manager.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more things, I'll take a look again tomorrow!
the value to store | ||
""" | ||
datadog_agent.write_persistent_cache(self._persistent_cache_id(key), value) | ||
datadog_agent.write_persistent_cache(key, value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still a breaking change.
datadog_checks_base/datadog_checks/base/utils/cache_key/full_config.py
Outdated
Show resolved
Hide resolved
datadog_checks_base/datadog_checks/base/utils/cache_key/base.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great change overall! 🚀
I only have a couple of nit picks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In regards to the key/base_key question, here is a suggestion to enforce the overriding of base_key
instead of key
.
My understanding is that the key
method should never be overridden, therefore the final
decorator can be used.
Another suggestion: rename base_key
to key_suffix
to be more explicit about the role of the method in the cache key construction.
IIUC, key()
already implements the cache invalidation between different check IDs. Then the key_suffix
allows for more granular control over the invalidation logic.
datadog_checks_base/datadog_checks/base/utils/cache_key/base.py
Outdated
Show resolved
Hide resolved
datadog_checks_base/datadog_checks/base/utils/cache_key/full_config.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's my final recommendation. It will reduce the diff by 80% and incorporates @dkirov-dd's idea about the more semantically correct and intuitive key_suffix
naming (although I flip it).
Let's create a new method on the base class which customers may override:
def get_persistent_cache_id(self) -> str:
"""
This ID is used to form the final prefix of all persistent cache key lookups.
"""
# The last component of the check ID is the hash of the instance configuration
return self.check_id.split(':')[-1]
There will be another method that becomes a check initialization which will set a private property:
# __init__
self.__persistent_cache_key_prefix: str | None = None
def __set_persistent_cache_key_prefix(self) -> None:
# Remove the instance hash from the check ID
namespace = ':'.join(self.check_id.split(':')[:-1])
self.__persistent_cache_key_prefix = f'{namespace}:{self.get_persistent_cache_id()}_'
Finally, change the API calls:
def read_persistent_cache(self, key: str) -> str:
return datadog_agent.read_persistent_cache(f'{self.__persistent_cache_key_prefix}{key}')
def write_persistent_cache(self, key: str, value: str) -> str:
return datadog_agent.write_persistent_cache(f'{self.__persistent_cache_key_prefix}{key}', value)
The benefits of this approach are that:
- There is no longer any overhead for accessing the cache with the proper key prefix. All computation happens once when the integration starts.
- Customers retain complete control without having to create a dedicated class just to return a string. They now have 2 options:
- We can create a simple utility for them to use within the overridden method.
- They can override the method to return an empty string and rely exclusively on the
name
option of each instance.
@ofek , answering to your suggestion. I understand that your point is to remove the the need of a class, the rest is essentially the same except for when the initialization happens (maybe?). I just realized we have a way of running delayed initialization in the check through the About not having a class, I disagree here. The intention of having a class is to allow reusable cache invalidation mechanisms. Partners often have the same way they want to invalidate log cursors. Having a way of encapsulating that logic in a reusable isolated way would be a great benefit for them. The option of not having a class means that the check itself would be littered with logic for cache invalidation logic. It would force developers to create their own custom version of the check if they want to reuse it when the only specialization they want is the logic for cache invalidation. I really think that encapsulating the logic into a class is a better choice here. The overhead we are talking about here is:
It is true that there is an overhead, but string concatenation is orders of magnitude below the typical latency of check operations to get the logs/metrics which is mostly IO bound. In any case, even though the overhead here when compared with the operations a check normally does is pretty small I appreciate it still exists. I have reshaped the implementation to have the best of both worlds:
I think this addresses your concern about performance while at the same time allowing encapsulating different cache invalidation strategies on their own context. Let me know what you think. Note: I extracted the |
I no longer have time to devote to reviewing this but please consider the core benefit of my final recommendation which you didn't seem to mention. The key prefix is purely an implementation detail and shouldn't be exposed to users. The novel approach in my last comment was for the customer to define an ID and the prefix is handled automatically. In my suggestion there is no need for separate classes, the concept of cache invalidation, nor does the customer even need to spend minutes reading docs and grokking the details. We can just provide them with a method that does the hash like you mentioned and if they have custom logic it can really be a single very simple function that computes an ID, whatever that means within the context of the integration. They also would be able to simplify their approach and just rely on instance's edit: Put more plainly, I think this will be very confusing for customers because I'm intimately familiar with how to write integrations and even I cannot articulate how to use the current approach without re-reading the code in the files tab 😅 |
I really cannot see how having the logic encapsulated in classes is more complex than the option you propose. I see clear benefits in this approach but I also see appreciate that the entry point for a developer to define custom invalidation mechanisms for the cache is easier with your approach. I also have the impression that if a developer is considering modifying the default behavior of the consistent cache, they are already exposed to the inner workings of the persistent caching mechanism. However, I see a great benefit on your approach and that is that the API is more flexible. With an API like the one you propose, a developer can decide to encapsulate, or not, any logic as complex as it can be and then just return the ID that identifies the particular check instance in the persistent cache. I have updated the PR and provide the config set method as part of the Thank you both for the time reviewing the PR! I really appreciate it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me now, just final suggestion. Thanks again!
#21316) * Add support for customizable cache keys to store log cursor * Add changelog * Remove PersistentCacheKey lefotver from AgentCheck * Fix docstring mistakes * Reduce cache key scope to checks with the same name * Add test to cover restart behavior * Address most of the comments on the PR * Fix imports in tests * Replace manual sorthing by mutable_hash utility * Update changelog message * Remove CacheKeyManager * Reword docstrings * Finalize update of ConfigSet * Reframe implementation towards a invalidation strategy more than the full key derivation * Fix module renaming and keep tests dry * Fix typo in prefix method * Simplify implementation to use persistent_cache_id * Simplify id namespace calculation (cherry picked from commit fc7560f)
#21316) * Add support for customizable cache keys to store log cursor * Add changelog * Remove PersistentCacheKey lefotver from AgentCheck * Fix docstring mistakes * Reduce cache key scope to checks with the same name * Add test to cover restart behavior * Address most of the comments on the PR * Fix imports in tests * Replace manual sorthing by mutable_hash utility * Update changelog message * Remove CacheKeyManager * Reword docstrings * Finalize update of ConfigSet * Reframe implementation towards a invalidation strategy more than the full key derivation * Fix module renaming and keep tests dry * Fix typo in prefix method * Simplify implementation to use persistent_cache_id * Simplify id namespace calculation fc7560f
DataDog#21316) * Add support for customizable cache keys to store log cursor * Add changelog * Remove PersistentCacheKey lefotver from AgentCheck * Fix docstring mistakes * Reduce cache key scope to checks with the same name * Add test to cover restart behavior * Address most of the comments on the PR * Fix imports in tests * Replace manual sorthing by mutable_hash utility * Update changelog message * Remove CacheKeyManager * Reword docstrings * Finalize update of ConfigSet * Reframe implementation towards a invalidation strategy more than the full key derivation * Fix module renaming and keep tests dry * Fix typo in prefix method * Simplify implementation to use persistent_cache_id * Simplify id namespace calculation fc7560f
#21316) (#21351) * Add support for customizable cache keys to store log cursor * Add changelog * Remove PersistentCacheKey lefotver from AgentCheck * Fix docstring mistakes * Reduce cache key scope to checks with the same name * Add test to cover restart behavior * Address most of the comments on the PR * Fix imports in tests * Replace manual sorthing by mutable_hash utility * Update changelog message * Remove CacheKeyManager * Reword docstrings * Finalize update of ConfigSet * Reframe implementation towards a invalidation strategy more than the full key derivation * Fix module renaming and keep tests dry * Fix typo in prefix method * Simplify implementation to use persistent_cache_id * Simplify id namespace calculation (cherry picked from commit fc7560f) Co-authored-by: Juanpe Araque <[email protected]> Co-authored-by: Steven Yuen <[email protected]>
What does this PR do?
Original implementation
The implementation outlined here is the original idea when the PR was opened. After review, the implementation has changed quite a bit.
This PR adds support for customizable cache keys to be used with the agent persistent cache. This is implemented in the following way:
CacheKey
abstract class that can be extended to implement any cache invalidation logic needed.AgentCheck
has been modified to rely on aCacheKey
implementation to provide the key for the persistent cache internally. Usingcheck_id
is now the default behavior through theFullConfigCacheKey
implementation but can be overriden when necessary.logs_persistent_cache_key
can be overriden by a developer that wants to modify what is theCacheKey
used to handle the logs cursor.CacheKey
that is to simple, the impact of key clashing is reduced to the same kind of check.CacheKey
interface that work as an example and to provide funcionality:FullConfigCacheKey
is equivalent to the previous behavior. It relies on thecheck_id
that includes a digest of the full configuration of the check. This default behavior makes this change backward compatible with the usage ofsend_log
andget_log_cursor
, both of them using persistent cache.ConfigSetCacheKey
implementation invalidates the cache only when a subset of configuration options have been updated. This provides a similar level of separation between different checks but allowing some configuration options to not invalidate the cache.CacheKeyManager
allows multiple cache keys to coexist in theAgentCheck
for differnet purposes. At the moment we only use theCacheKey
internally to manage the log cursor. However, if we want to include something else we can register a new type ofCacheKeyType
enum and maintain the new one in the manager without having to modify how theAgentCheck
behaves.Note: the manager forces to have each cache key represented by an enum to avoid relying on simple string keys per cache key. While the intention is to allow to personalize the cache invalidation mechanism, we do not want potentially arbitrary cache keys being register which could trigger cache misses if the string key is mistakenly generated.
This PR adds support for customizing the persistent cache prefix used when caching any data in the
AgentCheck
. This is done by allowing the developer to override thepersistent_cache_id
method. This method returns the ID that identifies the specific check instance in the Agent persistent cache.When a given check returns the same persistent cache ID, the cache keys generated will be stable. By default, the persistent cache ID is the digest of the full check configuration. Any update in the configuration will generate a different ID, causing the cache to be invalidated. This is the current behavior before introducing this feature, which ensures backward compatibility.
Motivation
Until now, integrations that wanted to emit logs and be consistent between agent restarts would have logs missing or duplicated if the user changed their configuration. This would invalidate the cache and there was limited control over the last cursor.
The intention of this change is to provide a backward compatible solution that allows developers to modify the cache invalidation behavior in an easy way.
Review checklist (to be filled by reviewers)
qa/skip-qa
label if the PR doesn't need to be tested during QA.backport/<branch-name>
label to the PR and it will automatically open a backport PR once this one is merged