Skip to content

Commit 1fd1133

Browse files
[SDP] UnresolvedFlow and Python/SQL APIs
1 parent 6ee83a8 commit 1fd1133

File tree

4 files changed

+62
-13
lines changed

4 files changed

+62
-13
lines changed

docs/declarative-pipelines/GraphRegistrationContext.md

Lines changed: 32 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,32 @@ toDataflowGraph: DataflowGraph
3535

3636
* `PipelinesHandler` is requested to [start a pipeline run](PipelinesHandler.md#startRun)
3737

38+
### assertNoDuplicates { #assertNoDuplicates }
39+
40+
```scala
41+
assertNoDuplicates(
42+
qualifiedTables: Seq[Table],
43+
validatedViews: Seq[View],
44+
qualifiedFlows: Seq[UnresolvedFlow]): Unit
45+
```
46+
47+
`assertNoDuplicates`...FIXME
48+
49+
### assertFlowIdentifierIsUnique { #assertFlowIdentifierIsUnique }
50+
51+
```scala
52+
assertFlowIdentifierIsUnique(
53+
flow: UnresolvedFlow,
54+
datasetType: DatasetType,
55+
flows: Seq[UnresolvedFlow]): Unit
56+
```
57+
58+
`assertFlowIdentifierIsUnique` throws an `AnalysisException` if the given [UnresolvedFlow](UnresolvedFlow.md)'s identifier is used by multiple flows (among the given `flows`):
59+
60+
```text
61+
Flow [flow_name] was found in multiple datasets: [dataset_names]
62+
```
63+
3864
## Tables { #tables }
3965

4066
`GraphRegistrationContext` creates an empty registry of [Table](Table.md)s when [created](#creating-instance).
@@ -78,9 +104,9 @@ registerFlow(
78104
`registerFlow` is used when:
79105

80106
* `PipelinesHandler` is requested to [define a flow](PipelinesHandler.md#defineFlow)
81-
* `SqlGraphRegistrationContext` is requested to [process the following SQL commands](SqlGraphRegistrationContext.md#processSqlQuery):
82-
* [CreateFlowCommand](../logical-operators/CreateFlowCommand.md)
83-
* [CreateMaterializedViewAsSelect](../logical-operators/CreateMaterializedViewAsSelect.md)
84-
* [CreateView](../logical-operators/CreateView.md)
85-
* [CreateStreamingTableAsSelect](../logical-operators/CreateStreamingTableAsSelect.md)
86-
* [CreateViewCommand](../logical-operators/CreateViewCommand.md)
107+
* `SqlGraphRegistrationContext` is requested to [handle the following logical commands](SqlGraphRegistrationContext.md#processSqlQuery):
108+
* [CreateFlowCommand](SqlGraphRegistrationContext.md#CreateFlowCommand)
109+
* [CreateMaterializedViewAsSelect](SqlGraphRegistrationContext.md#CreateMaterializedViewAsSelect)
110+
* [CreateView](SqlGraphRegistrationContext.md#CreateView)
111+
* [CreateStreamingTableAsSelect](SqlGraphRegistrationContext.md#CreateStreamingTableAsSelect)
112+
* [CreateViewCommand](SqlGraphRegistrationContext.md#CreateViewCommand)

docs/declarative-pipelines/PipelinesHandler.md

Lines changed: 16 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -45,11 +45,15 @@ handlePipelinesCommand(
4545

4646
### CREATE_DATAFLOW_GRAPH { #CREATE_DATAFLOW_GRAPH }
4747

48-
`handlePipelinesCommand` [creates a dataflow graph](#createDataflowGraph) and sends the graph ID back.
48+
[handlePipelinesCommand](#handlePipelinesCommand) creates a [dataflow graph](#createDataflowGraph) and sends the graph ID back.
49+
50+
### DROP_DATAFLOW_GRAPH { #DROP_DATAFLOW_GRAPH }
51+
52+
[handlePipelinesCommand](#handlePipelinesCommand)...FIXME
4953

5054
### DEFINE_DATASET { #DEFINE_DATASET }
5155

52-
`handlePipelinesCommand` prints out the following INFO message to the logs:
56+
[handlePipelinesCommand](#handlePipelinesCommand) prints out the following INFO message to the logs:
5357

5458
```text
5559
Define pipelines dataset cmd received: [cmd]
@@ -59,7 +63,7 @@ Define pipelines dataset cmd received: [cmd]
5963

6064
### <span id="DefineFlow"> DEFINE_FLOW { #DEFINE_FLOW }
6165

62-
`handlePipelinesCommand` prints out the following INFO message to the logs:
66+
[handlePipelinesCommand](#handlePipelinesCommand) prints out the following INFO message to the logs:
6367

6468
```text
6569
Define pipelines flow cmd received: [cmd]
@@ -69,14 +73,18 @@ Define pipelines flow cmd received: [cmd]
6973

7074
### START_RUN { #START_RUN }
7175

72-
`handlePipelinesCommand` prints out the following INFO message to the logs:
76+
[handlePipelinesCommand](#handlePipelinesCommand) prints out the following INFO message to the logs:
7377

7478
```text
7579
Start pipeline cmd received: [cmd]
7680
```
7781

7882
`handlePipelinesCommand` [starts a pipeline run](#startRun).
7983

84+
### DEFINE_SQL_GRAPH_ELEMENTS { #DEFINE_SQL_GRAPH_ELEMENTS }
85+
86+
[handlePipelinesCommand](#handlePipelinesCommand)...FIXME
87+
8088
## Start Pipeline Run { #startRun }
8189

8290
```scala
@@ -153,6 +161,9 @@ defineFlow(
153161
sparkSession: SparkSession): Unit
154162
```
155163

164+
??? note "DEFINE_FLOW Pipeline Command"
165+
`defineFlow` is used to handle [DEFINE_FLOW](#DEFINE_FLOW).
166+
156167
`defineFlow` looks up the [GraphRegistrationContext](DataflowGraphRegistry.md#getDataflowGraphOrThrow) for the given `flow` (or throws a `SparkException` if not found).
157168

158169
!!! note "Implicit Flows"
@@ -164,3 +175,4 @@ defineFlow(
164175
`defineFlow` reports an `AnalysisException` if the given `flow` is not an implicit flow, but is defined with a multi-part identifier.
165176

166177
In the end, `defineFlow` [registers a flow](GraphRegistrationContext.md#registerFlow).
178+

docs/declarative-pipelines/SqlGraphRegistrationContext.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ processSqlFile(
3737

3838
* `PipelinesHandler` is requested to [defineSqlGraphElements](PipelinesHandler.md#defineSqlGraphElements)
3939

40-
### Process Single SQL Query { #processSqlQuery }
40+
### Process Single Logical Command { #processSqlQuery }
4141

4242
```scala
4343
processSqlQuery(

docs/declarative-pipelines/UnresolvedFlow.md

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,17 @@
1+
---
2+
hide:
3+
- toc
4+
---
5+
16
# UnresolvedFlow
27

3-
`UnresolvedFlow` is a [Flow](Flow.md).
8+
`UnresolvedFlow` is a [Flow](Flow.md) that represents flows in the following Python and SQL transformations in [Spark Declarative Pipelines](index.md):
9+
10+
* [register_flow](GraphElementRegistry.md#register_flow) in PySpark's decorators
11+
* [CREATE FLOW ... AS INSERT INTO ... BY NAME](../logical-operators/CreateFlowCommand.md)
12+
* [CREATE MATERIALIZED VIEW](../logical-operators/CreateMaterializedViewAsSelect.md)
13+
* [CREATE STREAMING TABLE ... AS](../logical-operators/CreateStreamingTableAsSelect.md)
14+
* [CREATE VIEW](../logical-operators/CreateView.md) and the other variants of [CREATE VIEW](../logical-operators/CreateViewCommand.md)
415

516
`UnresolvedFlow` is registered to a [GraphRegistrationContext](GraphRegistrationContext.md) with [registerFlow](GraphRegistrationContext.md#registerFlow).
617

@@ -23,7 +34,7 @@
2334
`UnresolvedFlow` is created when:
2435

2536
* `PipelinesHandler` is requested to [define a flow](PipelinesHandler.md#defineFlow)
26-
* `SqlGraphRegistrationContext` is requested to [handle the following pipeline commands](SqlGraphRegistrationContext.md#processSqlQuery):
37+
* `SqlGraphRegistrationContext` is requested to [handle the following logical commands](SqlGraphRegistrationContext.md#processSqlQuery):
2738
* [CreateFlowCommand](SqlGraphRegistrationContext.md#CreateFlowCommand)
2839
* [CreateMaterializedViewAsSelect](SqlGraphRegistrationContext.md#CreateMaterializedViewAsSelect)
2940
* [CreateView](SqlGraphRegistrationContext.md#CreateView)

0 commit comments

Comments
 (0)