Skip to content

Conversation

aheev
Copy link

@aheev aheev commented Aug 20, 2025

@github-actions github-actions bot added triage PRs from the community tools labels Aug 20, 2025
@aheev
Copy link
Author

aheev commented Aug 20, 2025

@AndrewJSchofield can you please review this?

@aheev aheev changed the title KAFKA-19487: Improving consistency of command-line arguments for consumer performance tests KAFKA-19624: Improving consistency of command-line arguments for consumer performance tests Aug 20, 2025
@AndrewJSchofield AndrewJSchofield added ci-approved and removed triage PRs from the community labels Aug 22, 2025
@brandboat
Copy link
Member

ping @aheev, could you please fix the conflicts?

# Conflicts:
#	tools/src/main/java/org/apache/kafka/tools/ConsumerPerformance.java
#	tools/src/test/java/org/apache/kafka/tools/ConsumerPerformanceTest.java
@aheev
Copy link
Author

aheev commented Aug 25, 2025

ping @aheev, could you please fix the conflicts?

done

Copy link
Member

@brandboat brandboat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch, left some comments below.

@aheev aheev requested a review from brandboat August 25, 2025 06:27
@aheev
Copy link
Author

aheev commented Aug 25, 2025

FAILED ❌ RestoreIntegrationTest > "shouldInvokeUserDefinedGlobalStateRestoreListener(boolean).useNewProtocol=false"

This test seems to be flaky. It succeeds on my machine and fails sometimes. Also the changes are not even related to the test

Just tested. Same issue on trunk too

Copy link
Collaborator

@m1a2st m1a2st left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this patch, some comments left

@AndrewJSchofield AndrewJSchofield self-requested a review August 25, 2025 14:42
Copy link
Member

@AndrewJSchofield AndrewJSchofield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. Please add some tests for the variation combinations of new/old options to make sure the rules in the KIP are correctly implemented.

Copy link
Member

@brandboat brandboat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update, some minor comments.

Copy link
Member

@brandboat brandboat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Copy link
Member

@AndrewJSchofield AndrewJSchofield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from adding --command-property to both kafka-consumer-perf-test.sh and kafka-share-consumer-perf-test.sh, this looks good to me.

@aheev
Copy link
Author

aheev commented Sep 2, 2025

Overall, LGTM. Could you run the e2e test and share the results?

What do you mean by e2e test?

@m1a2st
Copy link
Collaborator

m1a2st commented Sep 2, 2025

Since you have modified the e2e test file consumer_performance.py, you can validate it by running the following command to ensure the test still works:
TC_PATHS="tests/kafkatest/benchmarks/core/benchmark_test.py" bash tests/docker/run_tests.sh

FYI: https://github.com/apache/kafka/blob/trunk/tests/README.md

@aheev
Copy link
Author

aheev commented Sep 2, 2025

Since you have modified the e2e test file consumer_performance.py, you can validate it by running the following command to ensure the test still works: TC_PATHS="tests/kafkatest/benchmarks/core/benchmark_test.py" bash tests/docker/run_tests.sh

FYI: https://github.com/apache/kafka/blob/trunk/tests/README.md

I will run them after andrew's comments are resolved

@AndrewJSchofield
Copy link
Member

@aheev Please resolve conflicts

Copy link
Member

@AndrewJSchofield AndrewJSchofield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. Once we have a green build, I'm ready to merge this.

@aheev
Copy link
Author

aheev commented Sep 3, 2025

Thanks for the PR. Once we have a green build, I'm ready to merge this.

Shouldn't we wait for this? I am trying to run the tests, but running into some issues

Since you have modified the e2e test file consumer_performance.py, you can validate it by running the following command to ensure the test still works:
TC_PATHS="tests/kafkatest/benchmarks/core/benchmark_test.py" bash tests/docker/run_tests.sh

# Conflicts:
#	tools/src/main/java/org/apache/kafka/tools/ConsumerPerformance.java
@AndrewJSchofield
Copy link
Member

Thanks for the PR. Once we have a green build, I'm ready to merge this.

Shouldn't we wait for this? I am trying to run the tests, but running into some issues

Since you have modified the e2e test file consumer_performance.py, you can validate it by running the following command to ensure the test still works:
TC_PATHS="tests/kafkatest/benchmarks/core/benchmark_test.py" bash tests/docker/run_tests.sh

Yes, we should. I'll wait then.

@aheev
Copy link
Author

aheev commented Sep 4, 2025

After a lot of struggle I managed to run the test by using a different JDK(will file a ticket for this later). Here are the results @m1a2st . Three tests failed, all of which are related to producer-perf
benchmark_test.log

@m1a2st
Copy link
Collaborator

m1a2st commented Sep 5, 2025

Here are the results @m1a2st . Three tests failed, all of which are related to producer-perf

Thanks, Could you file a ticket to trace this issue?

# Conflicts:
#	tools/src/main/java/org/apache/kafka/tools/ShareConsumerPerformance.java
@aheev
Copy link
Author

aheev commented Sep 5, 2025

Here are the results @m1a2st . Three tests failed, all of which are related to producer-perf

Thanks, Could you file a ticket to trace this issue?

https://issues.apache.org/jira/browse/KAFKA-19672

@aheev
Copy link
Author

aheev commented Sep 5, 2025

This test seems to be flaky and not related to the changes. It succeeded in a couple of runs in my local. @AndrewJSchofield should we trigger a CI re-run?

FAILED ❌ SmokeTestDriverIntegrationTest > "shouldWorkWithRebalance(boolean, boolean, boolean).stateUpdaterEnabled=false, processingThreadsEnabled=false, streamsProtocolEnabled=true"
FAILED ❌ SmokeTestDriverIntegrationTest > "shouldWorkWithRebalance(boolean, boolean, boolean).stateUpdaterEnabled=true, processingThreadsEnabled=false, streamsProtocolEnabled=true"
FAILED ❌ SmokeTestDriverIntegrationTest > "shouldWorkWithRebalance(boolean, boolean, boolean).stateUpdaterEnabled=true, processingThreadsEnabled=true, streamsProtocolEnabled=true"
Found 2 flaky test failures:
FLAKY ⚠️  SmokeTestDriverIntegrationTest > "shouldWorkWithRebalance(boolean, boolean, boolean).stateUpdaterEnabled=true, processingThreadsEnabled=true, streamsProtocolEnabled=false"
FLAKY ⚠️  SmokeTestDriverIntegrationTest > "shouldWorkWithRebalance(boolean, boolean, boolean).stateUpdaterEnabled=true, processingThreadsEnabled=false, streamsProtocolEnabled=false"

@AndrewJSchofield
Copy link
Member

@aheev I would merge latest changes into this branch. I find the test fails on your branch, and passes on trunk. I can't see that this PR would cause the tests to fail, so I'd update the branch and let the CI run again.

@aheev
Copy link
Author

aheev commented Sep 7, 2025

@Yunyung ConsumerPerformanceService and ShareConsumerPerformanceService are used in benchmark_test.py(which uses DEV_BRANCH and test_performance_services.py(which uses LATEST_2_1 and DEV_BRANCH)

  • ShareConsumerPerformanceService runs only in test_performance_services.py runs on branches >= 4.1. Hence, we can ignore this
  • As for ConsumerPerformanceService, I have added changes, but, while running the test, ProducerPerformanceService, which is used to produce data in turn consumed by ConsumerPerformanceService, fails, thereby blocking the ConsumerPerformanceService to consume on LATEST_2_1. It fails on trunk too
[INFO:2025-09-07 09:01:58,715]: RunnerClient: kafkatest.sanity_checks.test_performance_services.PerformanceServiceTest.test_version.version=2.1.1.metadata_quorum=ISOLATED_KRAFT: FAIL: Exception('ProducerPerformanceService-0-125869363841104-worker-1: Traceback (most recent call last):\n  File "/usr/local/lib/python3.10/dist-packages/ducktape/services/background_thread.py", line 36, in _protected_worker\n    self._worker(idx, node)\n  File "/opt/kafka-dev/tests/kafkatest/services/performance/producer_performance.py", line 130, in _worker\n    wait_until(lambda: self.alive(node), timeout_sec=20, err_msg="ProducerPerformance failed to start")\n  File "/usr/local/lib/python3.10/dist-packages/ducktape/utils/util.py", line 58, in wait_until\n    raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from last_exception\nducktape.errors.TimeoutError: ProducerPerformance failed to start\n')
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 351, in _do_run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 411, in run_test
    return self.test_context.function(self.test)
  File "/usr/local/lib/python3.10/dist-packages/ducktape/mark/_mark.py", line 438, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/opt/kafka-dev/tests/kafkatest/sanity_checks/test_performance_services.py", line 58, in test_version
    self.producer_perf.run()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/services/service.py", line 345, in run
    self.wait()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/services/background_thread.py", line 72, in wait
    self._propagate_exceptions()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/services/background_thread.py", line 103, in _propagate_exceptions
    raise Exception(self.errors)
Exception: ProducerPerformanceService-0-125869363841104-worker-1: Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/services/background_thread.py", line 36, in _protected_worker
    self._worker(idx, node)
  File "/opt/kafka-dev/tests/kafkatest/services/performance/producer_performance.py", line 130, in _worker
    wait_until(lambda: self.alive(node), timeout_sec=20, err_msg="ProducerPerformance failed to start")
  File "/usr/local/lib/python3.10/dist-packages/ducktape/utils/util.py", line 58, in wait_until
    raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from last_exception
ducktape.errors.TimeoutError: ProducerPerformance failed to start

I think we need to fix our test suite first
@AndrewJSchofield what is the way forward here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants