Skip to content

GH-759: Get length of byte[] in TryCopyLastError #760

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jul 1, 2025

Conversation

hnwyllmm
Copy link
Contributor

@hnwyllmm hnwyllmm commented May 20, 2025

What's Changed

We should get the length of byte[] by GetArrayLength, not strlen which may cause invalid memory access.

Closes #759.

Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to add a test?

@hnwyllmm
Copy link
Contributor Author

hnwyllmm commented May 21, 2025

The error message in CI doesn't seem to be related to my PR:

/opt/rh/devtoolset-10/root/usr/libexec/gcc/x86_64-redhat-linux/10/ld: cannot find -lutf8proc
  collect2: error: ld returned 1 exit status

linkage: https://github.com/apache/arrow-java/actions/runs/15140171347/job/42594091462?pr=760

Do you have some suggestions?

@hnwyllmm
Copy link
Contributor Author

Is it possible to add a test?

I'll try to add a test case in integration_tests.py.

@kou
Copy link
Member

kou commented May 21, 2025

The error message in CI doesn't seem to be related to my PR:

/opt/rh/devtoolset-10/root/usr/libexec/gcc/x86_64-redhat-linux/10/ld: cannot find -lutf8proc
  collect2: error: ld returned 1 exit status

linkage: https://github.com/apache/arrow-java/actions/runs/15140171347/job/42594091462?pr=760

Do you have some suggestions?

Could you open a separated issue for it?
Let's fix it as a separated task.

@lidavidm
Copy link
Member

lidavidm commented May 21, 2025

Is it possible to add a test?

I'll try to add a test case in integration_tests.py.

Do you need an integration test? IMO, you should be able to export a stream, then immediately import it, all within Java.

@hnwyllmm
Copy link
Contributor Author

The error message in CI doesn't seem to be related to my PR:

/opt/rh/devtoolset-10/root/usr/libexec/gcc/x86_64-redhat-linux/10/ld: cannot find -lutf8proc
  collect2: error: ld returned 1 exit status

linkage: https://github.com/apache/arrow-java/actions/runs/15140171347/job/42594091462?pr=760
Do you have some suggestions?

Could you open a separated issue for it? Let's fix it as a separated task.

issue: #767

@kou
Copy link
Member

kou commented May 24, 2025

Thanks. (We've fixed this problem in apache/arrow. Sorry for not sharing it here...)

@hnwyllmm
Copy link
Contributor Author

Is it possible to add a test?

I'll try to add a test case in integration_tests.py.

Do you need an integration test? IMO, you should be able to export a stream, then immediately import it, all within Java.

I add a unittest but I think it can't make sure TryCopyLastError works correctly, so I suggest it should be removed.
We can read memory after the error string address in most cases. We usually check such problems by sanity tools.

@lidavidm lidavidm changed the title fix get length of byte[] in TryCopyLastError GH-759: Get length of byte[] in TryCopyLastError May 25, 2025

This comment has been minimized.

@lidavidm
Copy link
Member

Can't the test validate that the exception has the right error message after making it through the FFI boundary?

@lidavidm lidavidm added the bug-fix PRs that fix a big. label May 25, 2025
@github-actions github-actions bot added this to the 18.4.0 milestone May 25, 2025
@hnwyllmm
Copy link
Contributor Author

Can't the test validate that the exception has the right error message after making it through the FFI boundary?

PTAL.
I use a real exception message in the ExceptionTest.

root.setRowCount(4);
batches.add(unloader.getRecordBatch());

final String exceptionMessage = "java.lang.RuntimeException: Error occurred while getting next schema root.\n\tat org.apache.arrow.adapter.jdbc.ArrowVectorIterator.next(ArrowVectorIterator.java:205)\n\tat com.oceanbase.external.jdbc.JdbcScanner.loadNextBatch(JdbcScanner.java:73)\n\tat org.apache.arrow.c.ArrayStreamExporter$ExportedArrayStreamPrivateData.getNext(ArrayStreamExporter.java:72)\nCaused by: java.lang.RuntimeException: Error occurred while consuming data.\n\tat org.apache.arrow.adapter.jdbc.ArrowVectorIterator.consumeData(ArrowVectorIterator.java:127)\n\tat org.apache.arrow.adapter.jdbc.ArrowVectorIterator.load(ArrowVectorIterator.java:178)\n\tat org.apache.arrow.adapter.jdbc.ArrowVectorIterator.next(ArrowVectorIterator.java:198)\n\t... 2 more\nCaused by: java.lang.OutOfMemoryError: Java heap space\n";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Um, this is a little excessive. Can we just use a short canary string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

}
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to assert that we did indeed get an exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE

Comment on lines 97 to 99
assertThat(eMessage.length()).isGreaterThan(expectExceptionMessage.length() + 1);
assertThat(eMessage.substring(eMessage.length() - expectExceptionMessage.length() - 1, eMessage.length() - 1))
.isEqualTo(expectExceptionMessage);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use contains...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we must make sure the string in the exception is the same with we throws.
"abc\x037" contains "abc", but it's not correct and we want to catch the bug.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...then you just want endsWith?

Copy link
Contributor Author

@hnwyllmm hnwyllmm Jun 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use endsWith to simplified the code.

Comment on lines 90 to 102
for (Object batch : batches) {
try {
reader.loadNextBatch();
} catch (Exception e) {
assertThat(exceptionThrowed).isEqualTo(null);
final String eMessage = e.getMessage();
// 1 for '}', ref to CDataJniException
assertThat(eMessage.length()).isGreaterThan(expectExceptionMessage.length() + 1);
assertThat(eMessage.substring(eMessage.length() - expectExceptionMessage.length() - 1, eMessage.length() - 1))
.isEqualTo(expectExceptionMessage);
exceptionThrowed = e;
continue;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use assertThrows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE.

@lidavidm
Copy link
Member

ah right, CI is broken...

@lidavidm
Copy link
Member

Regardless, there are Checkstyle failures


Error:  /build/c/src/test/java/org/apache/arrow/c/ExceptionTest.java:30:8: Unused import: java.util.Objects. [UnusedImports]
Error:  /build/c/src/test/java/org/apache/arrow/c/ExceptionTest.java:33:8: Unused import: org.apache.arrow.c.jni.CDataJniException. [UnusedImports]
Error:  /build/c/src/test/java/org/apache/arrow/c/ExceptionTest.java:36:8: Unused import: org.apache.arrow.vector.IntVector. [UnusedImports]
Error:  /build/c/src/test/java/org/apache/arrow/c/ExceptionTest.java:39:8: Unused import: org.apache.arrow.vector.VectorUnloader. [UnusedImports]

@hnwyllmm
Copy link
Contributor Author

Regardless, there are Checkstyle failures


Error:  /build/c/src/test/java/org/apache/arrow/c/ExceptionTest.java:30:8: Unused import: java.util.Objects. [UnusedImports]
Error:  /build/c/src/test/java/org/apache/arrow/c/ExceptionTest.java:33:8: Unused import: org.apache.arrow.c.jni.CDataJniException. [UnusedImports]
Error:  /build/c/src/test/java/org/apache/arrow/c/ExceptionTest.java:36:8: Unused import: org.apache.arrow.vector.IntVector. [UnusedImports]
Error:  /build/c/src/test/java/org/apache/arrow/c/ExceptionTest.java:39:8: Unused import: org.apache.arrow.vector.VectorUnloader. [UnusedImports]

sorry for that. fixed.

@lidavidm
Copy link
Member

There's still more:

Error:  Failed to execute goal com.diffplug.spotless:spotless-maven-plugin:2.44.4:check (spotless-check) on project arrow-c-data: The following files had format violations:
Error:      src/test/java/org/apache/arrow/c/ExceptionTest.java
Error:          @@ -51,7 +51,7 @@
Error:           ····final·List<Object>·batches·=·new·ArrayList<>();
Error:           
Error:           ····try·(BufferAllocator·allocator·=·new·RootAllocator();
Error:          -·········VectorSchemaRoot·root·=·VectorSchemaRoot.create(schema,·allocator))·{
Error:          +········VectorSchemaRoot·root·=·VectorSchemaRoot.create(schema,·allocator))·{
Error:           
Error:           ······final·String·exceptionMessage·=·"This·is·a·message·for·testing·exception.";
Error:           
Error:          @@ -66,7 +66,7 @@
Error:           ······ArrowReader·source·=·new·ExceptionMemoryArrowReader(allocator,·schema,·batches);
Error:           
Error:           ······try·(final·ArrowArrayStream·stream·=·ArrowArrayStream.allocateNew(allocator);
Error:          -···········final·VectorSchemaRoot·importRoot·=·VectorSchemaRoot.create(schema,·allocator))·{
Error:          +··········final·VectorSchemaRoot·importRoot·=·VectorSchemaRoot.create(schema,·allocator))·{
Error:           ········final·VectorLoader·loader·=·new·VectorLoader(importRoot);
Error:           ········Data.exportArrayStream(allocator,·source,·stream);
Error:           
Error:          @@ -85,10 +85,7 @@
Error:           ····private·final·DictionaryProvider·provider;
Error:           ····private·int·nextBatch;
Error:           
Error:          -····ExceptionMemoryArrowReader(
Error:          -········BufferAllocator·allocator,
Error:          -········Schema·schema,
Error:          -········List<Object>·batches)·{
Error:          +····ExceptionMemoryArrowReader(BufferAllocator·allocator,·Schema·schema,·List<Object>·batches)·{
Error:           ······super(allocator);
Error:           ······this.schema·=·schema;
Error:           ······this.batches·=·batches;
Error:  Run 'mvn spotless:apply' to fix these violations.

@hnwyllmm
Copy link
Contributor Author

Thanks for your patient. It seems the failed CI test cases are not related with this PR now. Please take a look.

@lidavidm
Copy link
Member

lidavidm commented Jul 1, 2025

Ok. I'll merge this, but the CI really needs to be fixed...I just don't have time for arrow-java these days :/

@lidavidm lidavidm merged commit 43b6b6c into apache:main Jul 1, 2025
20 of 24 checks passed
@hnwyllmm hnwyllmm deleted the fix-copyerror branch July 2, 2025 06:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-fix PRs that fix a big.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

coredump in TryCopyLastError
3 participants