Skip to content

Conversation

Tishj
Copy link
Owner

@Tishj Tishj commented Aug 20, 2025

DO NOT MERGE

This PR is just to be able to check the diff between these two branches correctly

//! NOTE: we initialize these to 0 because some rows will not set them, if the row is NULL
//! To simplify the implementation we just allocate 'dictionary.Size()' keys for each row
for (idx_t i = 0; i < keys_offset; i++) {
keys_selvec.set_index(i, 0);
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be removed, it's copied from to_variant.cpp, but in this case we actually know that the keys are used

array_of_struct_of_variants STRUCT(v VARIANT)[] YES NULL NULL NULL
struct_of_array_of_variants STRUCT(v VARIANT[]) YES NULL NULL NULL

# FIXME: RemapStruct is not supported for VARIANT, I don't think it can be, since the schema is part of the data
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is old, has been resolved already

Tishj and others added 28 commits October 10, 2025 12:48
Right now duckdb will always set `RULE_LAUNCH_COMPILE` when either
ccache or sccache is found. `RULE_LAUNCH_COMPILE` is [not meant
for](https://cmake.org/cmake/help/latest/prop_gbl/RULE_LAUNCH_COMPILE.html)
external use:

> *"This property is intended for internal use by ctest(1). Projects and
developers should use the <LANG>_COMPILER_LAUNCHER target properties or
the associated CMAKE_<LANG>_COMPILER_LAUNCHER variables instead."*

Using these vars plays nicer with upstreams that depend on DuckDB. This
fixes e.g. duckdb/duckdb-python#100.
This allows settings to be set, also prior to connecting to the initial
database. Syntax is:

```
  "settings": [
    {"name": "storage_compatibility_version", "value": "latest"}
  ]
```
Mytherin and others added 30 commits October 16, 2025 17:26
Correctly handle identifiers for `IngestionMode::CREATE`
…#19420)

Follow-up from duckdb#19393

There are a number of issues still caused by serializing CTE nodes -
this PR makes it so that we only serialize CTE nodes when MATERIALIZED
is explicitly defined, and serialize only the CommonTableExpressionMap
otherwise. In addition, we never deserialize CTENodes anymore - and
always reconstruct them from the CommonTableExpressionMap.
This is a follow up to duckdb#19368

In current configuration of the file path it will fail in CI - PR fixes
it.

We also might need bump httpfs again since we'll keep getting previous
state of this test case from currently bumped version as [it happens
now](https://github.com/duckdb/duckdb/actions/runs/18546600528/job/52866519080#step:6:5028).
Includes duckdb#19420

This PR adds forwards compatibility testing to the CI. Example usage of
the script:

```
python3 scripts/test_storage_compatibility.py --versions "1.2.1|1.3.2" --new-unittest build/release/test/unittest
```

The script works by running the unittester (compiled with the latest
version) with the config `test/configs/storage_compatibility.json` for
each test with a `CREATE TABLE` or `CREATE VIEW` statement. The provided
config then runs the test in `--force-storage` mode, but with a fixed
database path (`bwc_storage_test.db`). This then gives us a database
file that has a number of tables / views in it as created in the test.

For each version specified, we then try to read all tables and views
stored within that file. If any of the queries fail, we report an error
**if** the original CLI can successfully query the table / view. The
reason we also do this is that there might be invalid views in the file,
e.g. views that refer to files that have since been deleted.

In order to get the older versions, our install script is used (i.e. we
run `f'curl https://install.duckdb.org | DUCKDB_VERSION=1.3.2 sh` for
each included version).
…non-default schemas. (duckdb#19363)

### Description

Query `information_schema.tables` to get each table's schema instead of
parsing SQL. Emit `CREATE SCHEMA` statements before tables so dumps are
replayable.

- Added getTableSchema() to lookup schema from catalog
- Use qualified names for TableColumnList() and SELECT queries
- Emit `CREATE SCHEMA IF NOT EXISTS` for non-main schemas
- Fixed error code check (SQLITE_DONE is success, not error)

### Testing

**Verified manually:**
```sql
CREATE SCHEMA other;
CREATE TABLE other.t(a INT);
.dump
-- Now outputs: CREATE SCHEMA IF NOT EXISTS other; ... COMMIT;
```
Added Unit tests in `test_shell_basics.py`:
- Empty and populated tables in non-default schemas
- Multiple schemas simultaneously
- Quoted identifiers with special chars
- CREATE TABLE IF NOT EXISTS variant


Fixes : duckdb#19264
I've extended the `README` a bit, and also added schemata for the json
files.

In follow-up PRs, I think we should add the optional parameters to the
files, to then nicely generate the headers including "use instead" and
"destroy with" documentation.

Reworked my initial take on the schemata a bit after I found similar
work in @Maxxen's PR here: duckdb#19186
…x_nulls_last (duckdb#19434)

I also realized that the slow_test was not able to catch this, even
though it should have.

fixes: duckdblabs/duckdb-internal/issues/6270
Based on duckdb#19229 and
duckdb#19302 (these should be merged
first and then I'll merge this with main and undraft).

This PR introduces the `SET` and `RESET` statements, which are
transformed in `extension/autocomplete/transformer/transform_set.cpp`
 Follow up to: duckdb#17992

Replaces all `const string` with `DuckDB::String` in the operators.
…ct Hash Join (duckdb#19332)

Started from duckdb#19274

This PR makes the cached hashes of string dictionaries thread-safe by
adding a `mutex`, so that we can use `Vector` in parallel. We don't do
this anywhere in the code base, so to also test this functionality, I've
changed the perfect hash join to emit dictionary vectors that have a
size and identifier (just like the string dictionaries that come from
our storage). Emitting these allows further dictionary-based
optimizations during execution, such as our dictionary aggregation and
dictionary functions.

This seemed to give minor speedups in the regression test on my fork,
but it was quite small so it might be just noise.
duckdb#19436)

The following are now enabled by default
- `MetricsType::WAITING_TO_ATTACH_LATENCY`
- `MetricsType::ATTACH_LOAD_STORAGE_LATENCY`
- `MetricsType::ATTACH_REPLAY_WAL_LATENCY`
- `MetricsType::CHECKPOINT_LATENCY`

Follow up to:
- duckdb#19367
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.