External Tables: Persistent Postgres table for managing external data queried by DuckDB readers #951

YuweiXiao · 2025-10-11T07:39:21Z

This PR introduces external table support, allowing users to persist an external file's view queried through DuckDB's readers (read_csv, read_parquet, and read_json).

Previously, users had to embed file locations and options directly in queries, and use r[xx] syntax for column reference. External tables simplify this by defining file paths and reader options once at CREATE time, enabling clean SELECT statements without r[xx] syntax. This also opens room for access control on external files, such as fine-grained permissions like column-level visibility for different users.

CREATE TABLE Syntax

CREATE TABLE external_csv () USING duckdb WITH (
    duckdb_external_location = '../../data/iris.csv',
    duckdb_external_format = 'csv',
    duckdb_external_options = '{"header": true}'
);

-- Query like a regular table
SELECT * FROM external_csv;
SELECT "sepal.length" FROM external_csv;

-- Raw SQL way
SELECT r['sepal.length'] FROM read_parquet('../../data/iris.csv')

Features

DDL Support: CREATE TABLE, DROP TABLE, ALTER TABLE NAME
Auto Schema Inference: Column names and types inferred by DuckDB, persisted in Postgres catalog
Lazy loading: External tables are dynamically loaded as DuckDB views only when referenced in queries

visardida · 2025-10-11T20:06:37Z

This is a much needed improvement @YuweiXiao , thank you!

Does this implementation of external table support handle partitioned Parquet datasets for example, when using wildcard paths or recursive directory patterns such as:

read_parquet('/path/to/data/**/*.parquet')

In other words, if I create an external table pointing to a directory of Parquet partitions, will it automatically discover and read all matching files, or does it only support a single file path per table definition?

YuweiXiao · 2025-10-13T01:35:08Z

Does this implementation of external table support handle partitioned Parquet datasets for example, when using wildcard paths or recursive directory patterns such as:

read_parquet('/path/to/data/**/*.parquet')

In other words, if I create an external table pointing to a directory of Parquet partitions, will it automatically discover and read all matching files, or does it only support a single file path per table definition?

YES. External table tracks path / read_options in pg catalog. And file list is triggered for each query. Theoretically, all functionality supported by read_xxxx should also be available in external table.

JelteF · 2025-10-13T12:45:29Z

Thanks for the work on this! I also had something like this in mind, but I was thinking about using FOREIGN TABLES instead of table access methods for this. So I'm wondering why you went this route instead. (not saying that one is really better than the other, but I'm wondering what tradeoffs you considered)

YuweiXiao · 2025-10-13T14:04:52Z

Thanks for the work on this! I also had something like this in mind, but I was thinking about using FOREIGN TABLES instead of table access methods for this. So I'm wondering why you went this route instead. (not saying that one is really better than the other, but I'm wondering what tradeoffs you considered)

Yes, FOREIGN TABLE would definitely work too. I didn’t have a strong tradeoff in mind — mainly wanted to reuse the existing codebase as much as possible, e.g., the DuckDB AM that’s already properly hooked and the registered triggers.

I’ll take another look at the FOREIGN TABLE approach — it has a better semantic fit (i.e., metadata only table).

JelteF · 2025-10-16T07:39:13Z

Thinking about it more, I do think FOREIGN TABLE is a better fit for this semantically. Because the CREATE TABLE command that you have now isn't actually creating the backing files. It's only registering some already existing external data in postgres.

YuweiXiao · 2025-10-16T07:46:31Z

Thinking about it more, I do think FOREIGN TABLE is a better fit for this semantically. Because the CREATE TABLE command that you have now isn't actually creating the backing files. It's only registering some already existing external data in postgres.

Yeah. I will initiate a discussion thread and let's define the SQL interface (usage) before impl.

AndrewJackson2020 · 2025-10-17T17:30:20Z

The above change (or a similar change using FDW instead) would be great. One of the issues with the current syntax is that it does not play nice with ORM's which is a big annoyance for a lot of teams. Also I could see a usage pattern with pg_duckdb whereby you keep "live data" in postgres tables (or partitions) and "archive data" on s3/parquet. Would be great to be able to access both of these with a uniform interface.

YuweiXiao and others added 2 commits October 11, 2025 14:44

Support external table

104a418

format

75cd52c

YuweiXiao force-pushed the feat_ext_tbl branch from 2d77184 to 75cd52c Compare October 11, 2025 07:44

Yuwei Xiao added 2 commits October 11, 2025 15:53

fix compile

936cc27

adapt pg18

b5ffdc6

YuweiXiao force-pushed the feat_ext_tbl branch from 1c1c595 to b5ffdc6 Compare October 11, 2025 08:34

Add ctas regress case

2d67f45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

External Tables: Persistent Postgres table for managing external data queried by DuckDB readers #951

External Tables: Persistent Postgres table for managing external data queried by DuckDB readers #951

Uh oh!

YuweiXiao commented Oct 11, 2025

Uh oh!

visardida commented Oct 11, 2025 •

edited

Loading

Uh oh!

YuweiXiao commented Oct 13, 2025

Uh oh!

JelteF commented Oct 13, 2025

Uh oh!

YuweiXiao commented Oct 13, 2025

Uh oh!

JelteF commented Oct 16, 2025

Uh oh!

YuweiXiao commented Oct 16, 2025

Uh oh!

AndrewJackson2020 commented Oct 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

External Tables: Persistent Postgres table for managing external data queried by DuckDB readers #951

Are you sure you want to change the base?

External Tables: Persistent Postgres table for managing external data queried by DuckDB readers #951

Uh oh!

Conversation

YuweiXiao commented Oct 11, 2025

Uh oh!

visardida commented Oct 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

YuweiXiao commented Oct 13, 2025

Uh oh!

JelteF commented Oct 13, 2025

Uh oh!

YuweiXiao commented Oct 13, 2025

Uh oh!

JelteF commented Oct 16, 2025

Uh oh!

YuweiXiao commented Oct 16, 2025

Uh oh!

AndrewJackson2020 commented Oct 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

visardida commented Oct 11, 2025 •

edited

Loading