Skip to content

Conversation

HuyNguyen7994
Copy link

One of the most common use case for DuckDb is to run SQL tests locally. SQLMesh supports this natively with SQLGlot, but no alternative exists for dbt workflow. There's global macros but they don't cover many use cases. In my case it's to transpile to_date in postgres into strptime in duckdb. This plugin facilitates that by intercepting the original compiled sql (either in redshift, postgres, or mysql), and transpiling them into duckdb syntax.

@jwills
Copy link
Collaborator

jwills commented Apr 23, 2025

Ah, nice to see someone taking another crack at this one @HuyNguyen7994 -- I've never been confident that this would work well enough in the general case to include it in the project, although maybe a plugin-based approach is the right way to experiment with it? See #22 for my prior attempts to do this.

Did you have a dbt project where you used sqlglot to do this transpiling and had it work well?

@HuyNguyen7994
Copy link
Author

No, this is still also experimental on my part. In my case it's to transpile between different flavors of Postgres syntax, so my problem set is much smaller.

@HuyNguyen7994
Copy link
Author

Somehow changing where the cursor is modified fix all the errors ...

@calswbin
Copy link

calswbin commented Sep 2, 2025

@HuyNguyen7994 are there any updates with this one?

@goobill
Copy link

goobill commented Sep 22, 2025

Can I suggest changing this function slightly.

from dbt.adapters.events.logging import AdapterLogger

logger = AdapterLogger("DuckDB:SQLGlot")

class SqlglotWrapper(DuckDBCursorWrapper):
    def __init__(self, cursor, sql_from: str):
        self.sql_from = sql_from
        self._cursor = cursor

    def execute(self, sql, bindings=None):
        try:
            transpiled = transpile(
                sql, 
                read=self.sql_from, 
                write="duckdb", 
                identify=True, 
                pretty=False, 
                comments=False, 
                unsupported_level=errors.ErrorLevel.IMMEDIATE,
                error_level=errors.ErrorLevel.IMMEDIATE
            )
            sql = ";".join(transpiled)
        except errors.SqlglotError as err:
            logger.warning(err)

        super().execute(sql, bindings)

If the SQL contains multiple statements i.e. INSERT; SELECT; the transpile command will return multiple SQL statements.




I also found running this with Databricks SQL with DBT incremental load wouldn't work.

DBT creates it's own queries for the incremental load but wraps the columns with double quotes. When this is transpiled these double quotes get converted to single quotes. This then causes a run time error on DuckDB.

14:25:45    Runtime Error in model stg_events (models/example/stg_events.sql)
  Parser Error: syntax error at or near "'event_time'"
  
  LINE 1: INSERT INTO dev.main.stg_events ('event_time') (SELECT 'event_time' FROM stg_events__dbt_tmp...

For example

INSERT INTO dev.main.stg_events ("event_time") (SELECT "event_time" FROM stg_events__dbt_tmp);
SELECT 'foo bar bob' AS example_text;
INSERT INTO Customers ("CustomerName", "ContactName", "Address", "City", "PostalCode", "Country")
VALUES ('Cardinal', 'Tom B. Erichsen', 'Skagen 21', 'Stavanger', '4006', 'Norway');

Would be transpiled to

INSERT INTO dev.main.stg_events ('event_time') (SELECT 'event_time' FROM stg_events__dbt_tmp);
SELECT 'foo bar bob' AS example_text;
INSERT INTO Customers ('CustomerName', 'ContactName', 'Address', 'City', 'PostalCode', 'Country') 
VALUES ('Cardinal', 'Tom B. Erichsen', 'Skagen 21', 'Stavanger', '4006', 'Norway')

I have a temporary fix

def unquote_columns(expression: exp.Expression) -> exp.Expression:
    for node in expression.walk():
        if isinstance(node, (exp.Schema, exp.Select)) and node.expressions:
            for lit in node.expressions:
                if isinstance(lit, exp.Literal) and lit.args.get("is_string"):
                    lit.set("is_string", False)
    return expression

expressions = parse(sql, dialect=self.sql_from)
transformed = [unquote_columns(expr) for expr in expressions]
sql = ";".join(expr.sql(dialect=self.sql_from) for expr in transformed)

This focuses the literals in Schema() and Select() objects.

[Insert(
   this=Schema(
     this=Table(
       this=Identifier(this=stg_events, quoted=False),
       db=Identifier(this=main, quoted=False),
       catalog=Identifier(this=dev, quoted=False)),
     expressions=[
       Literal(this='event_time', is_string=True)]),
   expression=Subquery(
     this=Select(
       expressions=[
         Literal(this='event_time', is_string=True)],
       from=From(
         this=Table(
           this=Identifier(this=stg_events__dbt_tmp, quoted=False)))))),
 Select(
   expressions=[
     Alias(
       this=Literal(this='foo bar bob', is_string=True),
       alias=Identifier(this=example_text, quoted=False))]),
 Insert(
   this=Schema(
     this=Table(
       this=Identifier(this=Customers, quoted=False)),
     expressions=[
       Literal(this='CustomerName', is_string=True),
       Literal(this='ContactName', is_string=True),
       Literal(this='Address', is_string=True),
       Literal(this='City', is_string=True),
       Literal(this='PostalCode', is_string=True),
       Literal(this='Country', is_string=True)]),
   expression=Values(
     expressions=[
       Tuple(
         expressions=[
           Literal(this='Cardinal', is_string=True),
           Literal(this='Tom B. Erichsen', is_string=True),
           Literal(this='Skagen 21', is_string=True),
           Literal(this='Stavanger', is_string=True),
           Literal(this='4006', is_string=True),
           Literal(this='Norway', is_string=True)])]))]

Not sure this is the correct solution as it seems a bit hacky. But it seems to be a problem with DBT and transpiling to DuckDB.

Ill add it to my own local version of this plugin but curious if anyone else found this a problem and if there was a better way of going about solving it.

@goobill
Copy link

goobill commented Sep 22, 2025

@jwills maybe the PR should be split where only cursor = plugin.modify_cursor(cursor) is merged in and the SQLGlot plugin is a separate PR?

This would at least allow local versions of SQLGlot plugin to be used.

For instance,

default:
  outputs:
    dev:
      type: duckdb
      path: /tmp/dbt.duckdb
      plugins:
        - module: local/plugins/sqlglot
          config:
            sql_from: databricks

@jwills
Copy link
Collaborator

jwills commented Sep 22, 2025

@goobill yeah I would be good with a standalone PR that had a cursor plugin; I'm very wary of supporting the general SQL transpilation stuff in this project directly given how many different potential issues it raises for us, but I am happy to support folks who want to experiment with doing it on their own!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants