feat(datasets): Enrich databricks connect error message #1039

star-yar · 2025-03-14T20:01:23Z

Description

Resolves #1038

Development notes

Catches error raised by databricks-connect and reraises with suggestions on resolution.

Checklist

Opened this PR as a 'Draft Pull Request' if it is work-in-progress
Updated the documentation to reflect the code changes
Updated jsonschema/kedro-catalog-X.XX.json if necessary
Added a description of this change in the relevant RELEASE.md file
Added tests to cover my changes (no tests exist for get_spark so I'm marking this as done)
Received approvals from at least half of the TSC (required for adding a new, non-experimental dataset)

Signed-off-by: star-yar <[email protected]>

star-yar · 2025-03-14T22:31:42Z

Any ideas why this might fail? Not sure this is caused by my PR, wdyt?

@merelcht

merelcht · 2025-03-17T09:12:22Z

Any ideas why this might fail? Not sure this is caused by my PR, wdyt?

@merelcht

No this isn't related to your PR. This failure started showing up on our main branch too. We'll have a look at fixing it.

merelcht · 2025-03-20T10:57:09Z

@star-yar, the test failures are resolved, but now there's a coverage issue which is related to the changes you made. Can you add tests to cover the behaviour?

star-yar · 2025-03-28T14:33:33Z

Before implementing such a change, I wanted to share some additional learning.

We should not invoke Databricks Connect here. I think the newer Databricks Connect is intended for establishing the connection once and then retrieving the created session via pyspark.sql.SparkSesssion.

So, in the context of a kedro project, the user should create a hook that'll establish the connection via databricks connect. Then, the catalog code should always get spark session, not make it (meaning we need to remove databricks connect invocation completely), relying on the user creating the connection first.

So maybe we should mention this approach somewhere in docs. And rely on the fact that spark session is created not handling the creation case. This will impact the error message we output.

workflow now; no spark session pre-init

flowchart LR
    n1["kedro session creates"]
    n2["dataset invoked"]
    n3["dataset creates session
 through databricks-connect/pyspark"]
    n1 --> n2 --> n3

suggested workflow; no spark session pre-init

flowchart LR
    n1["kedro session creates"]
    n2["dataset invoked"]
    n3["dataset gets session
 through pyspark*"]
    n1 --> n2 --> n3

* if databricks-connect is installed, it'll complain that you're trying to create a session through pyspark - we handle this by raising the error and notifying user that he first needs to create a hook creating session.
@merelcht

Signed-off-by: Ankita Katiyar <[email protected]>

star-yar · 2025-04-16T14:57:58Z

Any reflections @merelcht ?

merelcht · 2025-05-30T11:28:57Z

Hi @star-yar , I'm really sorry for the late response.

You raise a very good point. In fact, this is what we recommend when using spark (without databricks) in the docs: https://docs.kedro.org/en/stable/integrations/pyspark_integration.html#initialise-a-sparksession-using-a-hook

We've had an open issue to rewrite our spark datasets for ever (#135), but never got round to it so the Spark related datasets have just evolved through contributions but not with a proper architecture in mind.

We're currently working on a major release, but afterwards we can put this back on our priority list. You're also more than welcome to make a contribution if you have time.

star-yar force-pushed the feature/extend-the-dbx-connect-error-message branch from e990a37 to 480ac00 Compare March 14, 2025 20:04

star-yar changed the title ~~feat:(datasets): Enrich databricks connect error message~~ feat(datasets): Enrich databricks connect error message Mar 14, 2025

star-yar force-pushed the feature/extend-the-dbx-connect-error-message branch 4 times, most recently from 27ff177 to 15d06fd Compare March 14, 2025 22:03

star-yar added 3 commits March 14, 2025 18:12

Extend err message

22293b4

Signed-off-by: star-yar <[email protected]>

Update RELEASE.md

25a4877

Signed-off-by: star-yar <[email protected]>

Refactor

78aa265

Signed-off-by: star-yar <[email protected]>

star-yar force-pushed the feature/extend-the-dbx-connect-error-message branch from 67a5016 to da3b0b2 Compare March 14, 2025 22:12

star-yar marked this pull request as draft March 14, 2025 22:13

Add missing re-raise

b962af8

Signed-off-by: star-yar <[email protected]>

star-yar force-pushed the feature/extend-the-dbx-connect-error-message branch from da3b0b2 to b962af8 Compare March 14, 2025 22:15

Format

1f6f183

Signed-off-by: star-yar <[email protected]>

star-yar marked this pull request as ready for review March 17, 2025 13:36

Merge branch 'main' into feature/extend-the-dbx-connect-error-message

aa0fc38

Merge branch 'main' into feature/extend-the-dbx-connect-error-message

916b1a4

Signed-off-by: Ankita Katiyar <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(datasets): Enrich databricks connect error message #1039

feat(datasets): Enrich databricks connect error message #1039

Uh oh!

star-yar commented Mar 14, 2025 •

edited

Loading

Uh oh!

star-yar commented Mar 14, 2025

Uh oh!

merelcht commented Mar 17, 2025

Uh oh!

merelcht commented Mar 20, 2025

Uh oh!

star-yar commented Mar 28, 2025 •

edited

Loading

Uh oh!

star-yar commented Apr 16, 2025

Uh oh!

merelcht commented May 30, 2025

Uh oh!

Uh oh!

feat(datasets): Enrich databricks connect error message #1039

Are you sure you want to change the base?

feat(datasets): Enrich databricks connect error message #1039

Uh oh!

Conversation

star-yar commented Mar 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Development notes

Checklist

Uh oh!

star-yar commented Mar 14, 2025

Uh oh!

merelcht commented Mar 17, 2025

Uh oh!

merelcht commented Mar 20, 2025

Uh oh!

star-yar commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

star-yar commented Apr 16, 2025

Uh oh!

merelcht commented May 30, 2025

Uh oh!

Uh oh!

star-yar commented Mar 14, 2025 •

edited

Loading

star-yar commented Mar 28, 2025 •

edited

Loading