Skip to content

Conversation

@jackiehanyang
Copy link
Collaborator

Description

  • Introducing a new Insights API
    • POST /_plugins/_anomaly_detection/insights/_start - Start insights job
    • GET /_plugins/_anomaly_detection/insights/_status - Get insights job status
    • GET /_plugins/_anomaly_detection/insights/_results - Get latest insights results
    • POST /_plugins/_anomaly_detection/insights/_stop - Stop insights job
  • Introducing ml-commons metrics correlation runtime dependency
    • sending anomaly results to ml-commons metrics correlation algorithm to analyze
    • write analyze results into insights-results index
    • frontend will read from this index to display insights on dashboard

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@kaituo
Copy link
Collaborator

kaituo commented Nov 13, 2025

CI failed due to jacoco changes in build.gradle. Not sure how to fix. One naive way is to add correlation request, response, and Action in AD to avoid ml-commons dependency.

* What went wrong:
Execution failed for task ':jacocoTestCoverageVerification'.
> A failure occurred while executing org.gradle.internal.jacoco.JacocoCoverageAction
   > Rule violated for class org.opensearch.ad.AnomalyDetectorRunner: branches covered ratio is 0.35, but expected minimum is 0.60
     Rule violated for class org.opensearch.ad.AnomalyDetectorRunner: lines covered ratio is 0.47, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.util.ModelUtil: branches covered ratio is 0.32, but expected minimum is 0.60
     Rule violated for class org.opensearch.timeseries.util.ModelUtil: lines covered ratio is 0.48, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.util.DataUtil: lines covered ratio is 0.72, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.feature.AbstractRetriever: branches covered ratio is 0.55, but expected minimum is 0.60
     Rule violated for class org.opensearch.timeseries.feature.AbstractRetriever: lines covered ratio is 0.63, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.feature.SearchFeatureDao: branches covered ratio is 0.28, but expected minimum is 0.60
     Rule violated for class org.opensearch.timeseries.feature.SearchFeatureDao: lines covered ratio is 0.59, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.rest.handler.ModelValidationActionHandler: branches covered ratio is 0.00, but expected minimum is 0.60
     Rule violated for class org.opensearch.timeseries.rest.handler.ModelValidationActionHandler: lines covered ratio is 0.00, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.rest.handler.ConfigUpdateConfirmer: branches covered ratio is 0.06, but expected minimum is 0.60
     Rule violated for class org.opensearch.timeseries.rest.handler.ConfigUpdateConfirmer: lines covered ratio is 0.19, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.rest.handler.AggregationPrep: branches covered ratio is 0.36, but expected minimum is 0.60
     Rule violated for class org.opensearch.timeseries.rest.handler.AggregationPrep: lines covered ratio is 0.40, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.rest.handler.IntervalCalculation: branches covered ratio is 0.06, but expected minimum is 0.60
     Rule violated for class org.opensearch.timeseries.rest.handler.IntervalCalculation: lines covered ratio is 0.18, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.rest.handler.LatestTimeRetriever: branches covered ratio is 0.00, but expected minimum is 0.60
     Rule violated for class org.opensearch.timeseries.rest.handler.LatestTimeRetriever: lines covered ratio is 0.00, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.rest.handler.IntervalCalculation.IntervalRecommendationListener: branches covered ratio is 0.37, but expected minimum is 0.60
     Rule violated for class org.opensearch.timeseries.rest.handler.IntervalCalculation.IntervalRecommendationListener: lines covered ratio is 0.55, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.ratelimit.ColdStartWorker: branches covered ratio is 0.45, but expected minimum is 0.60
     Rule violated for class org.opensearch.timeseries.ratelimit.ColdStartWorker: lines covered ratio is 0.73, but expected minimum is 0.75
     Rule violated for class org.opensearch.ad.transport.SuggestAnomalyDetectorParamTransportAction: branches covered ratio is 0.00, but expected minimum is 0.60
     Rule violated for class org.opensearch.ad.transport.SuggestAnomalyDetectorParamTransportAction: lines covered ratio is 0.11, but expected minimum is 0.75
     Rule violated for class org.opensearch.ad.transport.ADSuggestName: branches covered ratio is 0.00, but expected minimum is 0.60
     Rule violated for class org.opensearch.ad.transport.ADSuggestName: lines covered ratio is 0.57, but expected minimum is 0.75
     Rule violated for class org.opensearch.ad.transport.ADResultProcessor: branches covered ratio is 0.00, but expected minimum is 0.60
     Rule violated for class org.opensearch.ad.transport.ADResultProcessor: lines covered ratio is 0.61, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.model.InitProgressProfile: branches covered ratio is 0.50, but expected minimum is 0.60
     Rule violated for class org.opensearch.timeseries.model.IntervalTimeConfiguration: branches covered ratio is 0.50, but expected minimum is 0.60
     Rule violated for class org.opensearch.ad.ml.MLCommonsClient: lines covered ratio is 0.62, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.rest.RestValidateAction: branches covered ratio is 0.00, but expected minimum is 0.60
     Rule violated for class org.opensearch.timeseries.rest.RestValidateAction: lines covered ratio is 0.26, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.rest.RestJobAction: branches covered ratio is 0.00, but expected minimum is 0.60
     Rule violated for class org.opensearch.timeseries.rest.RestJobAction: lines covered ratio is 0.25, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.rest.AbstractSearchAction: lines covered ratio is 0.60, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.transport.SuggestConfigParamResponse: branches covered ratio is 0.59, but expected minimum is 0.60
     Rule violated for class org.opensearch.timeseries.transport.SuggestConfigParamRequest: branches covered ratio is 0.50, but expected minimum is 0.60
     Rule violated for class org.opensearch.timeseries.transport.SuggestConfigParamRequest: lines covered ratio is 0.68, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.transport.SuggestConfigParamResponse.Builder: lines covered ratio is 0.57, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.transport.handler.IndexMemoryPressureAwareResultHandler: branches covered ratio is 0.54, but expected minimum is 0.60
     Rule violated for class org.opensearch.timeseries.transport.handler.IndexMemoryPressureAwareResultHandler: lines covered ratio is 0.68, but expected minimum is 0.75
     Rule violated for class org.opensearch.ad.ratelimit.ADSaveResultStrategy: branches covered ratio is 0.43, but expected minimum is 0.60
     Rule violated for class org.opensearch.ad.ratelimit.ADSaveResultStrategy: lines covered ratio is 0.53, but expected minimum is 0.75

Copy link
Collaborator

@kaituo kaituo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partial review


log.info("Built correlation input: {} metrics × {} buckets", input.getNumMetrics(), input.getNumBuckets());

log.info("Matrix contents: {}", input.getMatrix());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The matrix can be huge and may expose metric/entity values. Recommend DEBUG with truncation or only log dimensions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added it for testing purpose, will clean up

Comment on lines 165 to 166
executionStartTime = executionEndTime.minus(24, ChronoUnit.HOURS);
// executionStartTime = executionEndTime.minus(intervalAmount, intervalUnit);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want to use the commented out line instead of hard coded 24 hours?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, changed it for testing purpose, will clean up


InjectSecurity injectSecurity = new InjectSecurity(jobParameter.getName(), settings, localClient.threadPool().getThreadContext());
try {
injectSecurity.inject(user, roles);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A normal user cannot query system index. Please add security tests.

// Insights job
// ======================================
// The Insights job name
public static final String INSIGHTS_JOB_NAME = "insights_job";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about changing to ad_insights_job in case we need forecasting job later?

Copy link
Collaborator

@kaituo kaituo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partial review

return ImmutableList
.of(
// Start insights job
new ReplacedRoute(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need ReplaceRoute as this is a new API. TimeSeriesAnalyticsPlugin.AD_BASE_URI alone is enough.

builder.startObject();

// Task metadata
builder.field("task_id", "task_" + ADCommonName.INSIGHTS_JOB_NAME + "_" + UUID.randomUUID().toString());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need task id? AD task id is the doc id of state index.


if (parts.length > 1) {
String seriesKey = parts[1];
seriesKeys.add(seriesKey);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the entities set redundant with seriesKeys set?

// Use MAX score if multiple anomalies in same bucket
double currentScore = bucketScores.getOrDefault(bucketIndex, 0.0);
double newScore = anomaly.getAnomalyScore();
bucketScores.put(bucketIndex, Math.max(currentScore, newScore));
Copy link
Collaborator

@kaituo kaituo Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider interval? Our anomalies are interval anomalies. We can put anomalies scores to all of the buckets interleaving current interval [data start, data end]. If you have already done it, can you point me the code? I cannot find it.

Comment on lines +44 to +48
"fields": {
"raw": {
"type": "keyword",
"ignore_above": 32766
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need keyword?

Copy link
Collaborator

@kaituo kaituo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partial review

Comment on lines +71 to +75
handleStartOperation(request, listener);
} else if (request.isStatusOperation()) {
handleStatusOperation(request, listener);
} else if (request.isStopOperation()) {
handleStopOperation(request, listener);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to stash context before accessing job index (system index)? Please add security tests.

.sort("generated_at", SortOrder.DESC)
);

client.search(searchRequest, ActionListener.wrap(searchResponse -> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to add backend role filtering before search? Please add security tests with backend role filtering on.


private static final Logger log = LogManager.getLogger(InsightsJobProcessor.class);

private static InsightsJobProcessor INSTANCE;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try {
injectSecurity.inject(user, roles);

localClient
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should verify if mapping is changed by customer before writing. If yes, report error/stop job and stop writing.

Instant.now(),
lockDurationSeconds,
user,
ADCommonName.INSIGHTS_RESULT_INDEX_ALIAS,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add this index in ADIndex? This would be consistent with other indexes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 2.x infra Changes to infrastructure, testing, CI/CD, pipelines, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants