Skip to content

Conversation

tmater
Copy link
Contributor

@tmater tmater commented Aug 20, 2025

This patch fixes the createViewWithCustomMetadataLocation tests for cloudTest tasks. The original test was generating temp directories internally, causing cloudTests to fail with BadRequestException instead of the expected ForbiddenException.

Changes:

  • Switched to Hadoop's Path (Java Path removes slashes, e.g. s3://bucket/path -> s3:/bucket/path)
  • Made base classes abstract to avoid running them
  • Implemented createViewWithCustomMetadataLocation to allow passing a custom location

Testing:

  • Verified locally

@github-project-automation github-project-automation bot moved this to PRs In Progress in Basic Kanban Board Aug 20, 2025
@tmater tmater marked this pull request as ready for review August 20, 2025 09:00

/** Runs PolarisRestCatalogViewIntegrationTest on AWS. */
public class PolarisRestCatalogViewAwsIntegrationTest
public abstract class PolarisRestCatalogViewAwsIntegrationTestBase
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're renaming this type anyway, let's better use the protocol name, not the vendor name.

Suggested change
public abstract class PolarisRestCatalogViewAwsIntegrationTestBase
public abstract class PolarisRestCatalogViewS3IntegrationTestBase

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall I change these 3 (non-view test bases) as well in one go:

  • PolarisRestCatalogAwsIntegrationTestBase
  • PolarisRestCatalogAzureIntegrationTestBase
  • PolarisRestCatalogGcpIntegrationTestBase

Also the 6 implementations:

  • RestCatalogAwsIT
  • RestCatalogAzureIT
  • RestCatalogGcpIT
  • RestCatalogViewAwsIT
  • RestCatalogViewAzureIT
  • RestCatalogViewGcpIT

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd support it, yea

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mean, we support MinIO and a bunch of other S3 compatible systems, not only AWS. Sure, we could argue about GCS or ADLS. However, GCS and ADLS are specific, where a vendor name or "product suite" name are not.

Copy link
Contributor Author

@tmater tmater Aug 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, renamed the classes and related parts.

@@ -20,13 +20,12 @@

import java.util.List;
import java.util.Optional;
import java.util.stream.Stream;
import org.apache.hadoop.fs.Path;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, sure, this is just a test, but could we avoid Hadoop dependencies?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea behind Hadoop Path came because Iceberg does something "funky" with the metadataFileLocation() that resembles Hadoop Path behavior: It normalizes the scheme file:///... to file:/..., but it keeps s3://... as s3://....
For the assertion later I couldn't use java.nio.Path because that normalizes the scheme and creates an incorrect s3:/... path.

Just dug a bit more, I found a reference to Hadoop Path in Iceberg's LocationProviders:24, maybe this is how they do it as well, I could not find the exact location where it gets normalized.

I’m open to other approaches, but the only solution that came to mind was writing a custom assertion, which I’m also happy with.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh - the damn file URI scheme is so ambiguous (IMHO, because of multiple representations for the same thing, legacy representations, risk of interpreting those wrong).

If it's just about eliminating all host related parts, I'd say let's just use a regex to "fix" the paths?
Something like

protected static final Pattern FILE_LOCATION_PATTERN =
  Pattern.compile("file:/*(.*)");

protected String fixFileUri(String location) {
  var m = FILE_LOCATION_PATTERN.matcher(location);
  return m.matches() ? "file:/" + m.group(1) : location;
}

WDYT? Would that work?

Copy link
Contributor

@adutra adutra Aug 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI In Polaris we have:

  • org.apache.polaris.core.storage.StorageLocation which standardizes file URIs to "file:///
  • but InMemoryStorageIntegration does the opposite 😅 :

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm - maybe let's ignore the file scheme specialty and just assume that it's "correct" if supplied.

Copy link
Contributor Author

@tmater tmater Aug 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I used java.nio.file.Path to remove the scheme.

tmater added 3 commits August 28, 2025 09:42
This patch fixes the createViewWithCustomMetadataLocation tests for
cloudTest tasks. The original test was generating temp directories
internally, causing cloudTests to fail with BadRequestException
instead of the expected ForbiddenException.

Changes:
- Switched to Hadoop's Path (Java Path removes slashes, e.g.
  s3://bucket/path -> s3:/bucket/path)
- Made base classes abstract to avoid running them
- Implemented createViewWithCustomMetadataLocation to allow
  passing a custom location

Testing:
- Verified locally
Comment on lines +185 to +187
protected String getCustomMetadataLocationDir() {
return "";
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
protected String getCustomMetadataLocationDir() {
return "";
}
protected abstract String getCustomMetadataLocationDir();

@@ -86,6 +86,8 @@ public abstract class PolarisRestCatalogViewIntegrationBase extends ViewCatalogT
private static PolarisApiEndpoints endpoints;
private static PolarisClient client;
private static ManagementApi managementApi;
protected static final String POLARIS_IT_SUBDIR = "polaris_it";
protected static final String POLARIS_IT_CUSTOM_SUBDIR = "polaris_it_custom";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This constant seems unused.

protected boolean shouldSkip() {
return Stream.of(BASE_LOCATION, TENANT_ID).anyMatch(Strings::isNullOrEmpty);
protected String getCustomMetadataLocationDir() {
return StorageUtil.concatFilePrefixes(BASE_LOCATION, POLARIS_IT_SUBDIR, File.separator);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be:

Suggested change
return StorageUtil.concatFilePrefixes(BASE_LOCATION, POLARIS_IT_SUBDIR, File.separator);
return StorageUtil.concatFilePrefixes(BASE_LOCATION, POLARIS_IT_CUSTOM_SUBDIR, File.separator);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants