Skip to content

Conversation

il-kyun
Copy link
Contributor

@il-kyun il-kyun commented Aug 17, 2025

Summary

  1. Why: Per-batch replication throttle set/clear causes excessive AdminClient calls when many batches involve the same brokers, significantly slowing rebalances. This PR proposes a bulk mode that sets the replication bytes throttle once before inter-broker movements and clears it once after, reducing Admin overhead and improving execution time.
  2. What:
    • Add bulk.replication.throttle.enabled (default: true) to ExecutorConfig.
    • In Executor, when enabled:
      • Set throttles once before inter-broker partition movements begin (filtering to existing topics).
      • Clear throttles once after all inter-broker partition movements complete, with summary logging (completed/aborted/dead/in-progress/aborting).
    • Keep existing per-batch behavior when disabled.
    • Update tests to cover both bulk and non-bulk paths.
    • Follow-ups: Improvements to the Kafka calls inside ReplicationThrottleHelper are being addressed in a separate PR; this PR focuses solely on bulk set/clear semantics.

Expected Behavior

  • With bulk.replication.throttle.enabled=true:
    • Throttles are applied once at the start of inter-broker moves and cleared once at the end.
    • Fewer AdminClient set/clear calls → faster rebalances.
    • No functional change to movement semantics; only fewer config mutations.
  • With bulk.replication.throttle.enabled=false:
    • Current per-batch set/clear behavior remains unchanged.

Actual Behavior

  • Throttles are currently set before each batch and cleared after each batch if no other tasks are active on the same broker.
  • When many batches affect the same brokers, the same throttle is repeatedly set/cleared.
  • Each Admin request can take seconds (~20s observed locally), causing noticeable slowdowns.

Steps to Reproduce

  1. Prepare a rebalance plan with many inter-broker replica movements that repeatedly involve the same brokers.
  2. Run the rebalance with current behavior (bulk disabled) and observe repeated set/clear throttle Admin calls per batch.
  3. Measure end-to-end inter-broker phase duration.
  4. Enable bulk.replication.throttle.enabled=true and re-run to observe reduced Admin calls and improved completion time.

Additional evidence

  1. Environment (test settings):
    • concurrency.adjuster.max.partition.movements.per.broker=12
    • default.replica.movement.strategies=com.linkedin.kafka.cruisecontrol.executor.strategy.PrioritizeMinIsrWithOfflineReplicasStrategy,com.linkedin.kafka.cruisecontrol.executor.strategy.PrioritizeOneAboveMinIsrWithOfflineReplicasStrategy,com.linkedin.kafka.cruisecontrol.executor.strategy.PrioritizeSmallReplicaMovementStrategy,com.linkedin.kafka.cruisecontrol.executor.strategy.BaseReplicaMovementStrategy
  2. Scenario and results:
    • 900 partitions to rebalance (800 small, 100 large).
    • Before (bulk disabled): moving the 800 small partitions took over 1 hour 30 minutes.
    • After enabling bulk.replication.throttle.enabled: the 800 small partitions completed within a few minutes.
  3. Logs: added summary logs when clearing bulk throttles (Completed/Aborted/Dead/InProgress/Aborting counts).
  4. Follow-ups:
    • We are improving the Kafka call patterns inside ReplicationThrottleHelper in a separate PR; this PR intentionally limits scope to the bulk set/clear behavior.

Categorization

  • documentation
  • bugfix
  • new feature
  • refactor
  • security/CVE
  • other

This PR resolves #1972

@il-kyun
Copy link
Contributor Author

il-kyun commented Aug 31, 2025

👋 @mhratson @CCisGG
I’ve submitted a PR for Cruise Control but haven’t received any reviews yet.
Would you mind taking a look when you get a chance? 🙏

Copy link
Contributor

@kyguy kyguy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good @il-kyun! I made a quick pass to help get this in front of the maintainers.

Is there any specific reason why the bulk replication throttle mode shouldn't be enabled by default? It seems like it would be helpful for most rebalances.

@il-kyun
Copy link
Contributor Author

il-kyun commented Sep 16, 2025

@kyguy Thanks for starting the review on my PR, I really appreciate it!

Is there any specific reason why the bulk replication throttle mode shouldn't be enabled by default? It seems like it would be helpful for most rebalances.

I initially set bulk.replication.throttle.enabled to false to avoid surprising changes in existing deployments and to preserve backward compatibility.
I agree — I also think switching the default to true is better.

@il-kyun il-kyun requested a review from kyguy September 16, 2025 16:02
Copy link
Contributor

@kyguy kyguy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates! Just left some minor comments.

Similar to the related PR here: #2305 I wonder if it would be better to simply update the existing non-batching logic to this batching implementation instead of having it be configurable to save us the code complexity. I can't think of a reason why users would not want to batch requests like this. Anyways, I'll defer the maintainers on that!

@il-kyun
Copy link
Contributor Author

il-kyun commented Sep 19, 2025

Thanks for the updates! Just left some minor comments.

Similar to the related PR here: #2305 I wonder if it would be better to simply update the existing non-batching logic to this batching implementation instead of having it be configurable to save us the code complexity. I can't think of a reason why users would not want to batch requests like this. Anyways, I'll defer the maintainers on that!

I also think this is useful.
The reason I kept it configurable for now is that I wasn’t 100% sure if there might be any side effects with replacing the existing non-batching logic directly. If the maintainers and contributors feel comfortable with it, I’m happy to update the PR so that batching simply replaces the old logic — that would indeed make the code simpler. #2304 #2305

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Redundant set/unset throttling during the rebalance
2 participants