REP-6492 Switch to $sampleRate-style partitioning #128
+332
−74
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
$sample-based partitioning has proven problematic for some years now because it often creates highly-imbalanced partitions.
This changeset switches partitioning to use $sampleRate instead. Because this entails a full index scan it tends to be slower; we offset that by creating partition tasks immediately as we receive sampled partition boundaries rather than all at once at the end of the aggregation.
Because MongoDB 4.2 lacked $sampleRate (and $rand as well), the legacy partitioning logic remains for use with that server version.
Both legacy & $sampleRate partitioning are made to use
available
read concern andsecondaryPreferred
read preference. These aggregations don’t need consistency, but they benefit substantially from speed & minimizing workload on the primary.A few simplifications are made here as well. For example,
MongosyncID
is removed from the PartitionKey struct since it’s never actually relevant, and certain parameters to the legacy partitioner are made constant (since they were always used thus).