-
Notifications
You must be signed in to change notification settings - Fork 0
Parquet variant intermediate conversion #145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: variant_logical_type
Are you sure you want to change the base?
Parquet variant intermediate conversion #145
Conversation
//! NOTE: we initialize these to 0 because some rows will not set them, if the row is NULL | ||
//! To simplify the implementation we just allocate 'dictionary.Size()' keys for each row | ||
for (idx_t i = 0; i < keys_offset; i++) { | ||
keys_selvec.set_index(i, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be removed, it's copied from to_variant.cpp
, but in this case we actually know that the keys are used
array_of_struct_of_variants STRUCT(v VARIANT)[] YES NULL NULL NULL | ||
struct_of_array_of_variants STRUCT(v VARIANT[]) YES NULL NULL NULL | ||
|
||
# FIXME: RemapStruct is not supported for VARIANT, I don't think it can be, since the schema is part of the data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is old, has been resolved already
…n the DatabaseFilePathManager
…cription_get_column_type
Right now duckdb will always set `RULE_LAUNCH_COMPILE` when either ccache or sccache is found. `RULE_LAUNCH_COMPILE` is [not meant for](https://cmake.org/cmake/help/latest/prop_gbl/RULE_LAUNCH_COMPILE.html) external use: > *"This property is intended for internal use by ctest(1). Projects and developers should use the <LANG>_COMPILER_LAUNCHER target properties or the associated CMAKE_<LANG>_COMPILER_LAUNCHER variables instead."* Using these vars plays nicer with upstreams that depend on DuckDB. This fixes e.g. duckdb/duckdb-python#100.
…ermediate_conversion
This allows settings to be set, also prior to connecting to the initial database. Syntax is: ``` "settings": [ {"name": "storage_compatibility_version", "value": "latest"} ] ```
…m the hotpath here
Correctly handle identifiers for `IngestionMode::CREATE`
…#19420) Follow-up from duckdb#19393 There are a number of issues still caused by serializing CTE nodes - this PR makes it so that we only serialize CTE nodes when MATERIALIZED is explicitly defined, and serialize only the CommonTableExpressionMap otherwise. In addition, we never deserialize CTENodes anymore - and always reconstruct them from the CommonTableExpressionMap.
This is a follow up to duckdb#19368 In current configuration of the file path it will fail in CI - PR fixes it. We also might need bump httpfs again since we'll keep getting previous state of this test case from currently bumped version as [it happens now](https://github.com/duckdb/duckdb/actions/runs/18546600528/job/52866519080#step:6:5028).
Includes duckdb#19420 This PR adds forwards compatibility testing to the CI. Example usage of the script: ``` python3 scripts/test_storage_compatibility.py --versions "1.2.1|1.3.2" --new-unittest build/release/test/unittest ``` The script works by running the unittester (compiled with the latest version) with the config `test/configs/storage_compatibility.json` for each test with a `CREATE TABLE` or `CREATE VIEW` statement. The provided config then runs the test in `--force-storage` mode, but with a fixed database path (`bwc_storage_test.db`). This then gives us a database file that has a number of tables / views in it as created in the test. For each version specified, we then try to read all tables and views stored within that file. If any of the queries fail, we report an error **if** the original CLI can successfully query the table / view. The reason we also do this is that there might be invalid views in the file, e.g. views that refer to files that have since been deleted. In order to get the older versions, our install script is used (i.e. we run `f'curl https://install.duckdb.org | DUCKDB_VERSION=1.3.2 sh` for each included version).
…non-default schemas. (duckdb#19363) ### Description Query `information_schema.tables` to get each table's schema instead of parsing SQL. Emit `CREATE SCHEMA` statements before tables so dumps are replayable. - Added getTableSchema() to lookup schema from catalog - Use qualified names for TableColumnList() and SELECT queries - Emit `CREATE SCHEMA IF NOT EXISTS` for non-main schemas - Fixed error code check (SQLITE_DONE is success, not error) ### Testing **Verified manually:** ```sql CREATE SCHEMA other; CREATE TABLE other.t(a INT); .dump -- Now outputs: CREATE SCHEMA IF NOT EXISTS other; ... COMMIT; ``` Added Unit tests in `test_shell_basics.py`: - Empty and populated tables in non-default schemas - Multiple schemas simultaneously - Quoted identifiers with special chars - CREATE TABLE IF NOT EXISTS variant Fixes : duckdb#19264
I've extended the `README` a bit, and also added schemata for the json files. In follow-up PRs, I think we should add the optional parameters to the files, to then nicely generate the headers including "use instead" and "destroy with" documentation. Reworked my initial take on the schemata a bit after I found similar work in @Maxxen's PR here: duckdb#19186
…x_nulls_last (duckdb#19434) I also realized that the slow_test was not able to catch this, even though it should have. fixes: duckdblabs/duckdb-internal/issues/6270
Based on duckdb#19229 and duckdb#19302 (these should be merged first and then I'll merge this with main and undraft). This PR introduces the `SET` and `RESET` statements, which are transformed in `extension/autocomplete/transformer/transform_set.cpp`
Follow up to: duckdb#17992 Replaces all `const string` with `DuckDB::String` in the operators.
…ct Hash Join (duckdb#19332) Started from duckdb#19274 This PR makes the cached hashes of string dictionaries thread-safe by adding a `mutex`, so that we can use `Vector` in parallel. We don't do this anywhere in the code base, so to also test this functionality, I've changed the perfect hash join to emit dictionary vectors that have a size and identifier (just like the string dictionaries that come from our storage). Emitting these allows further dictionary-based optimizations during execution, such as our dictionary aggregation and dictionary functions. This seemed to give minor speedups in the regression test on my fork, but it was quite small so it might be just noise.
duckdb#19436) The following are now enabled by default - `MetricsType::WAITING_TO_ATTACH_LATENCY` - `MetricsType::ATTACH_LOAD_STORAGE_LATENCY` - `MetricsType::ATTACH_REPLAY_WAL_LATENCY` - `MetricsType::CHECKPOINT_LATENCY` Follow up to: - duckdb#19367
…ermediate_conversion
…ake inequality operations work
DO NOT MERGE
This PR is just to be able to check the diff between these two branches correctly