[SPARK-53544][PYTHON] Support complex types on observations #52321

ueshin · 2025-09-12T01:50:44Z

What changes were proposed in this pull request?

Supports complex types on observations.

Why are the changes needed?

The observations didn't support complex types.

For example:

>>> observation = Observation("struct")
>>> df = spark.range(10).observe(
...     observation,
...     F.struct(F.count(F.lit(1)).alias("rows"), F.max("id").alias("maxid")).alias("struct"),
... )

classic

>>> df.collect()
[Row(id=0), Row(id=1), Row(id=2), Row(id=3), Row(id=4), Row(id=5), Row(id=6), Row(id=7), Row(id=8), Row(id=9)]
>>> observation.get
{'struct': JavaObject id=o61}

connect

>>> df.collect()
Traceback (most recent call last):
...
pyspark.errors.exceptions.base.PySparkTypeError: [UNSUPPORTED_LITERAL] Unsupported Literal 'struct {
...

Does this PR introduce any user-facing change?

Yes, complex types are available on observations.

>>> df.collect()
[Row(id=0), Row(id=1), Row(id=2), Row(id=3), Row(id=4), Row(id=5), Row(id=6), Row(id=7), Row(id=8), Row(id=9)]
>>>
>>> observation.get
{'struct': Row(rows=10, maxid=9)}

How was this patch tested?

Added the related tests.

Was this patch authored or co-authored using generative AI tooling?

No.

ueshin · 2025-09-12T01:52:09Z

cc @heyihong

xinrong-meng · 2025-09-12T23:48:24Z

python/pyspark/sql/connect/expressions.py

@@ -436,11 +442,48 @@ def _to_value(
            assert dataType is None or isinstance(dataType, DayTimeIntervalType)
            return DayTimeIntervalType().fromInternal(literal.day_time_interval)
        elif literal.HasField("array"):
-            elementType = proto_schema_to_pyspark_data_type(literal.array.element_type)
+            if literal.array.HasField("data_type"):


Curious when does data_type exist and when does it not?

For example of Array, data_type was added and element_type was deprecated.

https://github.com/apache/spark/blob/master/sql/connect/common/src/main/protobuf/spark/connect/expressions.proto#L227-L244

but the creator of the message may follow it or may not, so this should be able to handle both cases.

Same for Map and Struct.

To minimize changes, it is not necessary to support these new data type fields. They are still under development and not fully stabilized yet. Currently, the new data type fields are only used in the requests from the Spark Connect Scala Client.

If we really want to support them, I think we need to consider:

How can we enable the new data type fields in the responses while maintaining backward compatibility?

How is this code path tested?

Sure, I'll revert the changes and leave it to you.

Support complex types on observations.

86e66c8

github-actions bot added SQL PYTHON CONNECT labels Sep 12, 2025

ueshin requested a review from zhengruifeng September 12, 2025 01:52

xinrong-meng reviewed Sep 12, 2025

View reviewed changes

xinrong-meng approved these changes Sep 12, 2025

View reviewed changes

ueshin added 2 commits September 13, 2025 16:30

Merge branch 'master' into issues/SPARK-53544/complex_observation

057727a

Fix.

d0d2e9e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-53544][PYTHON] Support complex types on observations #52321

[SPARK-53544][PYTHON] Support complex types on observations #52321

ueshin commented Sep 12, 2025

Uh oh!

ueshin commented Sep 12, 2025

Uh oh!

xinrong-meng Sep 12, 2025

Uh oh!

ueshin Sep 12, 2025

Uh oh!

heyihong Sep 13, 2025 •

edited

Loading

Uh oh!

ueshin Sep 13, 2025 •

edited

Loading

Uh oh!

Uh oh!

[SPARK-53544][PYTHON] Support complex types on observations #52321

Are you sure you want to change the base?

[SPARK-53544][PYTHON] Support complex types on observations #52321

Conversation

ueshin commented Sep 12, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

ueshin commented Sep 12, 2025

Uh oh!

xinrong-meng Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

ueshin Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

heyihong Sep 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ueshin Sep 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

heyihong Sep 13, 2025 •

edited

Loading

ueshin Sep 13, 2025 •

edited

Loading