Skip to content

Conversation

raghuvanshraj
Copy link
Contributor

… parquet ingestion/search

Description

[Describe what this change achieves]

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

github-actions bot commented Oct 6, 2025

❌ Gradle check result for 2aa6905: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Comment on lines 80 to 81
/modules/parquet-data-format/src/main/rust/target
/modules/parquet-data-format/src/main/resources/native/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we name this as codec-parquet/data-format-parquet

Comment on lines +9 to +16
apply plugin: 'opensearch.opensearchplugin'

opensearchplugin {
name = 'dataformat-csv'
description = 'CSV data format plugin for OpenSearch DataFusion'
classname = 'org.opensearch.datafusion.csv.CsvDataFormatPlugin'
hasNativeController = false
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming we can get rid of this?

Comment on lines +39 to +49

// TODO : move to vectorized exec specific plugin
@Override
public Optional<Map<org.opensearch.vectorized.execution.search.DataFormat, DataSourceCodec>> getDataSourceCodecs() {
Map<org.opensearch.vectorized.execution.search.DataFormat, DataSourceCodec> codecs = new HashMap<>();
CsvDataSourceCodec csvDataSourceCodec = new CsvDataSourceCodec();
// TODO : version it correctly - similar to lucene codecs?
codecs.put(csvDataSourceCodec.getDataFormat(), new CsvDataSourceCodec());
return Optional.of(codecs);
// return Optional.empty();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the comment looks like this would be moved to core and common for all plugins that would register as a data source

Comment on lines +51 to +60
@Override
public <T extends DataFormat> IndexingExecutionEngine<T> indexingEngine(MapperService mapperService, ShardPath shardPath) {
if (CsvDataFormat.class.equals(getDataFormatType())) {
@SuppressWarnings("unchecked")
IndexingExecutionEngine<T> engine = (IndexingExecutionEngine<T>) new CsvEngine();
return engine;
}
throw new IllegalArgumentException("Unsupported data format type: " + getDataFormatType());
}

Copy link
Contributor

@Bukhtawar Bukhtawar Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if this would require something like an EngineFactory to have support for remote directory with input as EngineConfig

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants