-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Adding parquet-data-format, dataformat-csv, engine-datafusion for e2e… #19537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: feature/datafusion
Are you sure you want to change the base?
Adding parquet-data-format, dataformat-csv, engine-datafusion for e2e… #19537
Conversation
… parquet ingestion/search
❌ Gradle check result for 2aa6905: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
/modules/parquet-data-format/src/main/rust/target | ||
/modules/parquet-data-format/src/main/resources/native/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we name this as codec-parquet
/data-format-parquet
apply plugin: 'opensearch.opensearchplugin' | ||
|
||
opensearchplugin { | ||
name = 'dataformat-csv' | ||
description = 'CSV data format plugin for OpenSearch DataFusion' | ||
classname = 'org.opensearch.datafusion.csv.CsvDataFormatPlugin' | ||
hasNativeController = false | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming we can get rid of this?
|
||
// TODO : move to vectorized exec specific plugin | ||
@Override | ||
public Optional<Map<org.opensearch.vectorized.execution.search.DataFormat, DataSourceCodec>> getDataSourceCodecs() { | ||
Map<org.opensearch.vectorized.execution.search.DataFormat, DataSourceCodec> codecs = new HashMap<>(); | ||
CsvDataSourceCodec csvDataSourceCodec = new CsvDataSourceCodec(); | ||
// TODO : version it correctly - similar to lucene codecs? | ||
codecs.put(csvDataSourceCodec.getDataFormat(), new CsvDataSourceCodec()); | ||
return Optional.of(codecs); | ||
// return Optional.empty(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the comment looks like this would be moved to core and common for all plugins that would register as a data source
@Override | ||
public <T extends DataFormat> IndexingExecutionEngine<T> indexingEngine(MapperService mapperService, ShardPath shardPath) { | ||
if (CsvDataFormat.class.equals(getDataFormatType())) { | ||
@SuppressWarnings("unchecked") | ||
IndexingExecutionEngine<T> engine = (IndexingExecutionEngine<T>) new CsvEngine(); | ||
return engine; | ||
} | ||
throw new IllegalArgumentException("Unsupported data format type: " + getDataFormatType()); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if this would require something like an EngineFactory
to have support for remote directory with input as EngineConfig
… parquet ingestion/search
Description
[Describe what this change achieves]
Related Issues
Resolves #[Issue number to be closed when this PR is merged]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.