-
Notifications
You must be signed in to change notification settings - Fork 121
Add SQLGlot Plugin to transpile from any SQL dialect into DuckDb #544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Ah, nice to see someone taking another crack at this one @HuyNguyen7994 -- I've never been confident that this would work well enough in the general case to include it in the project, although maybe a plugin-based approach is the right way to experiment with it? See #22 for my prior attempts to do this. Did you have a dbt project where you used sqlglot to do this transpiling and had it work well? |
No, this is still also experimental on my part. In my case it's to transpile between different flavors of Postgres syntax, so my problem set is much smaller. |
Somehow changing where the cursor is modified fix all the errors ... |
@HuyNguyen7994 are there any updates with this one? |
Can I suggest changing this function slightly. from dbt.adapters.events.logging import AdapterLogger
logger = AdapterLogger("DuckDB:SQLGlot")
class SqlglotWrapper(DuckDBCursorWrapper):
def __init__(self, cursor, sql_from: str):
self.sql_from = sql_from
self._cursor = cursor
def execute(self, sql, bindings=None):
try:
transpiled = transpile(
sql,
read=self.sql_from,
write="duckdb",
identify=True,
pretty=False,
comments=False,
unsupported_level=errors.ErrorLevel.IMMEDIATE,
error_level=errors.ErrorLevel.IMMEDIATE
)
sql = ";".join(transpiled)
except errors.SqlglotError as err:
logger.warning(err)
super().execute(sql, bindings) If the SQL contains multiple statements i.e. INSERT; SELECT; the transpile command will return multiple SQL statements. I also found running this with Databricks SQL with DBT incremental load wouldn't work. DBT creates it's own queries for the incremental load but wraps the columns with double quotes. When this is transpiled these double quotes get converted to single quotes. This then causes a run time error on DuckDB.
For example INSERT INTO dev.main.stg_events ("event_time") (SELECT "event_time" FROM stg_events__dbt_tmp);
SELECT 'foo bar bob' AS example_text;
INSERT INTO Customers ("CustomerName", "ContactName", "Address", "City", "PostalCode", "Country")
VALUES ('Cardinal', 'Tom B. Erichsen', 'Skagen 21', 'Stavanger', '4006', 'Norway'); Would be transpiled to INSERT INTO dev.main.stg_events ('event_time') (SELECT 'event_time' FROM stg_events__dbt_tmp);
SELECT 'foo bar bob' AS example_text;
INSERT INTO Customers ('CustomerName', 'ContactName', 'Address', 'City', 'PostalCode', 'Country')
VALUES ('Cardinal', 'Tom B. Erichsen', 'Skagen 21', 'Stavanger', '4006', 'Norway') I have a temporary fix def unquote_columns(expression: exp.Expression) -> exp.Expression:
for node in expression.walk():
if isinstance(node, (exp.Schema, exp.Select)) and node.expressions:
for lit in node.expressions:
if isinstance(lit, exp.Literal) and lit.args.get("is_string"):
lit.set("is_string", False)
return expression
expressions = parse(sql, dialect=self.sql_from)
transformed = [unquote_columns(expr) for expr in expressions]
sql = ";".join(expr.sql(dialect=self.sql_from) for expr in transformed) This focuses the literals in Schema() and Select() objects.
Not sure this is the correct solution as it seems a bit hacky. But it seems to be a problem with DBT and transpiling to DuckDB. Ill add it to my own local version of this plugin but curious if anyone else found this a problem and if there was a better way of going about solving it. |
@jwills maybe the PR should be split where only This would at least allow local versions of SQLGlot plugin to be used. For instance, default:
outputs:
dev:
type: duckdb
path: /tmp/dbt.duckdb
plugins:
- module: local/plugins/sqlglot
config:
sql_from: databricks |
@goobill yeah I would be good with a standalone PR that had a cursor plugin; I'm very wary of supporting the general SQL transpilation stuff in this project directly given how many different potential issues it raises for us, but I am happy to support folks who want to experiment with doing it on their own! |
One of the most common use case for DuckDb is to run SQL tests locally. SQLMesh supports this natively with SQLGlot, but no alternative exists for dbt workflow. There's global macros but they don't cover many use cases. In my case it's to transpile
to_date
in postgres intostrptime
in duckdb. This plugin facilitates that by intercepting the original compiled sql (either in redshift, postgres, or mysql), and transpiling them into duckdb syntax.