Skip to content

The optimizer incorrectly refactored a EXISTS subquery into a join with an empty CTE #6181

@dllggyx

Description

@dllggyx

Hi, I encountered a non-equivalence issue with the code below.

def rewrite(schema_ddl, raw_sql, target_dialect='mysql'):
    schema = sqlglot.MappingSchema(dialect=target_dialect)
    create_expressions = sqlglot.parse(schema_ddl, read=target_dialect)

    # Iterate over each CREATE expression
    for create_expr in create_expressions:
        if isinstance(create_expr, exp.Create) and create_expr.kind == 'TABLE':
            table_expression = create_expr.this

            # Extract column definitions
            columns = {}
            if isinstance(table_expression, exp.Schema):
                for column_def in table_expression.expressions:
                    if isinstance(column_def, exp.ColumnDef):
                        col_name = column_def.this.name
                        col_type = column_def.kind.sql(dialect=target_dialect)
                        columns[col_name] = col_type
                table_name = table_expression.this
            else:
                table_name = table_expression

            schema.add_table(table_name, columns)

    expression = sqlglot.parse_one(raw_sql, read=target_dialect)
    optimized_expression = optimize(expression, schema=schema, dialect=target_dialect)
    optimized_sql = optimized_expression.sql(dialect=target_dialect, pretty=True)

    print(optimized_sql)

I ran the code with the following inputs:

schema_ddl:

DROP TABLE IF EXISTS `t0`;
CREATE TABLE `t0` (
  `c0` decimal(10,0) DEFAULT NULL,
  `c1` text,
  `c2` mediumtext 
) ;
INSERT INTO `t0` VALUES (NULL,'F','0.7663902271443832');

DROP TABLE IF EXISTS `t1`;
CREATE TABLE `t1` (
  `c0` tinytext
) ;
INSERT INTO `t1` VALUES (NULL),(NULL),(NULL),(NULL);

raw_sql:

SELECT ALL  EXISTS (SELECT 1 wHERE FALSE) AS ref0 FROM t1, t0 GROUP BY t0.c2;

Then I got the optimized sql:

WITH `_u_0` AS (
  SELECT
    1 AS `1`
  WHERE
    FALSE
)
SELECT
  MAX(`_u_0`.`1`) AS `ref0`
FROM `t1` AS `t1`
CROSS JOIN `_u_0` AS `_u_0`
CROSS JOIN `t0` AS `t0`
GROUP BY
  `t0`.`c2` 

The execution result of raw sql in MySQL 8.0.4:

+------+
| ref0 |
+------+
|    0 |
+------+
1 row in set (0.002 sec)

The result of optimized sql:

Empty set (0.002 sec)

The optimized SQL's execution result is empty because the CTE is always empty and contains no rows.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions