Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,14 @@ public void warnOnFailedChecks(
@Observes Startup event,
Instance<ProductionReadinessCheck> checks,
ReadinessConfiguration config) {
List<Error> errors = checks.stream().flatMap(check -> check.getErrors().stream()).toList();
List<Error> errors =
checks.stream()
.flatMap(check -> check.getErrors().stream())
.filter(
error ->
config.ignoreOffendingProperties().stream()
.noneMatch(prop -> prop.equalsIgnoreCase(error.offendingProperty())))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach LGTM in general. However, WDYT about adding a new "ID" property to Error, e.g. error.getId().

The ID could be a static constant for cases where there could only ever be one Error instance per "check" code (e.g. checkUserPrincipalMetricTag) or it could be a deterministic (hash) function of the "type" plus some parameters (e.g. checkInsecureStorageSettings could produce IDs like storage-17af38 and storage-46fq98).

The idea is that admin users should suppress specific error instances, but not "ranges" of errors. This way, if an admin user suppresses one particular check cases, new checks will still be visible when Polaris adds them. The value of error.offendingProperty() may still be too broad in some cases.

The "hash" part being deterministic will allow admin users to propagate the same configuration to all their deployment environments. At the same time, it is not easy to guess, which will force the admin user to review what exactly needs to be suppressed. Also, if the meaning of the error changes, we can change the ID, and it will force the admin users to reassess the implications (and re-suppress).

WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that admin users should suppress specific error instances

What would this flow look like, though? Is this to support cases like I want to allow setting config X, but not Y, and I want to allow setting config Z to A or B but not to C.? I fear we are at risk of overengineering this a bit. As it is, only admins have access to these configs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from my POV, this is not so much about config A=X or A=Y, but more about "Polaris detected something dangerous about X". Now, if the admin user suppresses this warning, I do not want the suppression to automatically hide future warnings about "dangerous Y".

It may be related to some specific config, but may be not. I can imagine running as the root OS user falls under the same category of auto-detectable issues.

Copy link
Contributor Author

@fivetran-kostaszoumpatianos fivetran-kostaszoumpatianos Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filtering based on error ids would also fit well with the naming change that you propose @eric-maynard then we can call the ignore method: ignoreSelectedIssues since we will now have a way of filtering by issue. Maybe the parameter hash is a bit too much. For me, I would be ok deactivating a check altogether if I know that I have a dangerous config there. It depends on how cautious we want to be.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be ok without the parameter hash for a start. However, having a concise but non-predictable error ID is important I think. That is to say, a user suppressing a particular error must first observe the error. It should not be easy to suppress something "proactively" :) At the same time the error ID should not be dependent on the runtime env. (i.e. be the same in all k8s pods, for example). WDYT?

.toList();
if (!errors.isEmpty()) {
var utf8 = Charset.defaultCharset().equals(StandardCharsets.UTF_8);
var warning = utf8 ? WARNING_SIGN_UTF_8 : WARNING_SIGN_PLAIN;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
import io.quarkus.runtime.annotations.StaticInitSafe;
import io.smallrye.config.ConfigMapping;
import io.smallrye.config.WithDefault;
import java.util.Set;

@StaticInitSafe
@ConfigMapping(prefix = "polaris.readiness")
Expand All @@ -33,4 +34,11 @@ public interface ReadinessConfiguration {
*/
@WithDefault("false")
boolean ignoreSevereIssues();

/**
* Set of properties that, if found to be misconfigured, will be ignored when determining the
* production readiness.
*/
@WithDefault("{}")
Set<String> ignoreOffendingProperties();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the asymmetry between this and ignoreSevereIssues? IIUC this is basically a subset of severe (?) issues that the admin wants to configure the readiness check to ignore. Maybe ignoreSelectIssues?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because technically the check is based on the offending property and not the issue type. ignoreIssuesForSelectedOffendingProperties ? maybe too much?

}