-
Notifications
You must be signed in to change notification settings - Fork 14.7k
KAFKA-19789: Log an error when we get duplicate acquired offsets in ShareFetchResponse. #20752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ShivsundarR thanks for this patch!
| acquiredRecordList.add(new OffsetAndDeliveryCount(offset, acquiredRecords.deliveryCount())); | ||
| if (!offsets.add(offset)) { | ||
| log.error("Duplicate acquired record offset {} found in share fetch response for partition {}. " + | ||
| "This indicates a broker processing issue.", offset, partition.topicPartition()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, are there any known issues that lead to duplicate offsets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there was a broker side issue when SharePartition was at capacity - https://issues.apache.org/jira/browse/KAFKA-19808. Due to this, we were getting duplicate offsets (with different delivery counts) in the ShareFetchResponse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are no current known issues, but there was previously an issue in the broker and adding logging would have made it quicker to get to the bottom of it.
|
|
||
| // Verify all offsets are unique | ||
| Set<Long> offsetSet = new HashSet<>(); | ||
| for (ConsumerRecord<String, String> record : records) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this covers the new behavior, since inFlightRecords already handles offset deduplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the logic around inFlightRecords ensures we do not send duplicate offsets to the application side, but the client does respond with a GAP acknowledgement to the broker for any duplicate offset.
kafka/clients/src/main/java/org/apache/kafka/clients/consumer/internals/ShareCompletedFetch.java
Line 219 in 2c44448
| inFlightBatch.addGap(nextAcquired.offset); |
Without deduplication, when the offset is encountered second time,lastRecord.offset > nextAcquired.offset, (as nextAcquired will be an older offset) will be true, so the client acknowledges these offsets as GAPs which is kind of hiding the main issue.
As the broker is already in a bad state(duplication should never happen), we thought of logging an error and ignoring any duplicates on the client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. Just one initial comment from a first look.
| private List<OffsetAndDeliveryCount> buildAcquiredRecordList(List<ShareFetchResponseData.AcquiredRecords> partitionAcquiredRecords) { | ||
| List<OffsetAndDeliveryCount> acquiredRecordList = new LinkedList<>(); | ||
| // Set to find duplicates in case of overlapping acquired records | ||
| Set<Long> offsets = new HashSet<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if you could change the partitionAcquiredRecords into a LinkedHashMap or similar to combine the duplicate checking with the ordered iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a look into making the acquiredRecordsList(LinkedList<OffsetAndDeliveryCount>) into a LinkedHashMap This change would actually have a bit of a code change around listIterator, we might have to use map.entrySet().iterator() for rewinding to the start of the list.
And as we are doing sequential operations and not key based, probably better to keep it as a list?
I have changed it to ArrayList instead of a LinkedList though as it would give better iteration performance for build once and iterate use cases.
|
|
||
| private List<OffsetAndDeliveryCount> buildAcquiredRecordList(List<ShareFetchResponseData.AcquiredRecords> partitionAcquiredRecords) { | ||
| List<OffsetAndDeliveryCount> acquiredRecordList = new LinkedList<>(); | ||
| List<OffsetAndDeliveryCount> acquiredRecordList = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By default, a new list will have space for 10 elements. Resizing is expensive. Maybe one optimisation would be to see how many offsets are in the first element in the partitionAcquiredRecords, and using that number as the initial size of the list. In the case of only one batch of offsets, the list will be the correct size alrready. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, if most of the times the response is gonna contain only 1 batch, we can avoid resizing. I have made the change. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update. Just one more comment.
| private List<OffsetAndDeliveryCount> buildAcquiredRecordList(List<ShareFetchResponseData.AcquiredRecords> partitionAcquiredRecords) { | ||
| List<OffsetAndDeliveryCount> acquiredRecordList = new LinkedList<>(); | ||
| // Setting the size of the array to the size of the first batch of acquired records. In case there is only 1 batch acquired, resizing would not happen. | ||
| int initialListSize = !partitionAcquiredRecords.isEmpty() ? (int) (partitionAcquiredRecords.get(0).lastOffset() - |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case where the partitionAcquiredRecords is empty, we can just make an empty list and return directly. We don't need to make the HashSet only to discard it unused because the loop will not have any iterations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes makes sense, I have updated the code now.
What
https://issues.apache.org/jira/browse/KAFKA-19789
ShareFetchResponsecontainedduplicate acquired records, this was a broker side bug.
this case and acknowledged with
GAPtype for any duplicate occurrence.acknowledge the duplicate offsets as the broker is already in a bad
state.