-
Notifications
You must be signed in to change notification settings - Fork 143
External Tables: Persistent Postgres table for managing external data queried by DuckDB readers #951
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
2d77184 to
75cd52c
Compare
1c1c595 to
b5ffdc6
Compare
|
This is a much needed improvement @YuweiXiao , thank you! Does this implementation of external table support handle partitioned Parquet datasets for example, when using wildcard paths or recursive directory patterns such as:
In other words, if I create an external table pointing to a directory of Parquet partitions, will it automatically discover and read all matching files, or does it only support a single file path per table definition? |
YES. External table tracks path / read_options in pg catalog. And file list is triggered for each query. Theoretically, all functionality supported by |
|
Thanks for the work on this! I also had something like this in mind, but I was thinking about using FOREIGN TABLES instead of table access methods for this. So I'm wondering why you went this route instead. (not saying that one is really better than the other, but I'm wondering what tradeoffs you considered) |
Yes, FOREIGN TABLE would definitely work too. I didn’t have a strong tradeoff in mind — mainly wanted to reuse the existing codebase as much as possible, e.g., the DuckDB AM that’s already properly hooked and the registered triggers. I’ll take another look at the FOREIGN TABLE approach — it has a better semantic fit (i.e., metadata only table). |
|
Thinking about it more, I do think FOREIGN TABLE is a better fit for this semantically. Because the CREATE TABLE command that you have now isn't actually creating the backing files. It's only registering some already existing external data in postgres. |
Yeah. I will initiate a discussion thread and let's define the SQL interface (usage) before impl. |
|
The above change (or a similar change using FDW instead) would be great. One of the issues with the current syntax is that it does not play nice with ORM's which is a big annoyance for a lot of teams. Also I could see a usage pattern with pg_duckdb whereby you keep "live data" in postgres tables (or partitions) and "archive data" on s3/parquet. Would be great to be able to access both of these with a uniform interface. |
This PR introduces external table support, allowing users to persist an external file's view queried through DuckDB's readers (
read_csv,read_parquet, andread_json).Previously, users had to embed file locations and options directly in queries, and use
r[xx]syntax for column reference. External tables simplify this by defining file paths and reader options once at CREATE time, enabling clean SELECT statements withoutr[xx]syntax. This also opens room for access control on external files, such as fine-grained permissions like column-level visibility for different users.CREATE TABLE Syntax
Features
CREATE TABLE,DROP TABLE,ALTER TABLE NAME