You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
*[@create_streaming_table](./index.md#create_streaming_table), [@table](./index.md#table), [@materialized_view](./index.md#materialized_view), [@temporary_view](./index.md#temporary_view) decorators are used
Copy file name to clipboardExpand all lines: docs/declarative-pipelines/GraphRegistrationContext.md
+56-21Lines changed: 56 additions & 21 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -24,17 +24,29 @@ Eventually, `GraphRegistrationContext` [becomes a DataflowGraph](#toDataflowGrap
24
24
toDataflowGraph:DataflowGraph
25
25
```
26
26
27
-
`toDataflowGraph` creates a new [DataflowGraph](DataflowGraph.md) with the [tables](#tables), [views](#views), and [flows](#flows) fully-qualified, resolved, and de-duplicated.
27
+
`toDataflowGraph` creates a new [DataflowGraph](DataflowGraph.md) with the [tables](#tables), [views](#views), [sinks](#sinks)and [flows](#flows) fully-qualified, resolved, and de-duplicated.
28
28
29
29
??? note "AnalysisException"
30
-
`toDataflowGraph` reports an `AnalysisException`for a`GraphRegistrationContext`with no [tables](#tables) and no `PersistedView`s (in the [views](#views) registry).
30
+
`toDataflowGraph` reports an `AnalysisException`when this`GraphRegistrationContext`is [empty](#isPipelineEmpty).
31
31
32
32
---
33
33
34
34
`toDataflowGraph` is used when:
35
35
36
36
*`PipelinesHandler` is requested to [start a pipeline run](PipelinesHandler.md#startRun)
37
37
38
+
### isPipelineEmpty { #isPipelineEmpty }
39
+
40
+
```scala
41
+
isPipelineEmpty:Boolean
42
+
```
43
+
44
+
`isPipelineEmpty` is `true` when this pipeline (this `GraphRegistrationContext`) is empty, i.e., for all the following met:
45
+
46
+
1. No [tables](#tables) registered
47
+
1. No [PersistedView](PersistedView.md)s registered (among the [views](#views))
48
+
1. No [sinks](#sinks) registered
49
+
38
50
### assertNoDuplicates { #assertNoDuplicates }
39
51
40
52
```scala
@@ -65,48 +77,71 @@ Flow [flow_name] was found in multiple datasets: [dataset_names]
65
77
66
78
`GraphRegistrationContext` creates an empty registry of [Table](Table.md)s when [created](#creating-instance).
67
79
68
-
A new [Table](Table.md) is added when [registerTable](#registerTable).
80
+
A new [Table](Table.md) is added when `GraphRegistrationContext` is requested to [register a table](#registerTable).
69
81
70
82
## Views { #views }
71
83
72
84
`GraphRegistrationContext` creates an empty registry of [View](View.md)s when [created](#creating-instance).
73
85
86
+
## Sinks { #sinks }
87
+
88
+
`GraphRegistrationContext` creates an empty registry of [Sink](Sink.md)s when [created](#creating-instance).
89
+
74
90
## Flows { #flows }
75
91
76
92
`GraphRegistrationContext` creates an empty registry of [UnresolvedFlow](UnresolvedFlow.md)s when [created](#creating-instance).
77
93
78
-
## Register Table { #registerTable }
94
+
## Register Flow { #registerFlow }
79
95
80
96
```scala
81
-
registerTable(
82
-
tableDef: Table):Unit
97
+
registerFlow(
98
+
flowDef: UnresolvedFlow):Unit
83
99
```
84
100
85
-
`registerTable` adds the given [Table](Table.md) to the [tables](#tables) registry.
101
+
`registerFlow` adds the given [UnresolvedFlow](UnresolvedFlow.md) to the [flows](#flows) registry.
86
102
87
103
---
88
104
89
-
`registerTable` is used when:
105
+
`registerFlow` is used when:
90
106
91
-
*`PipelinesHandler` is requested to [define a dataset](PipelinesHandler.md#defineDataset)
107
+
*`PipelinesHandler` is requested to [define a flow](PipelinesHandler.md#defineFlow)
108
+
*`SqlGraphRegistrationContext` is requested to [process the following logical commands](SqlGraphRegistrationContext.md#processSqlQuery):
109
+
*[CREATE FLOW ... AS INSERT INTO ... BY NAME](../logical-operators/CreateFlowCommand.md)
Copy file name to clipboardExpand all lines: docs/declarative-pipelines/PipelinesHandler.md
+35-23Lines changed: 35 additions & 23 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -26,16 +26,17 @@ handlePipelinesCommand(
26
26
|-----------------|-------------|-----------|
27
27
|`CREATE_DATAFLOW_GRAPH`|[Creates a new dataflow graph](#CREATE_DATAFLOW_GRAPH)|[pyspark.pipelines.spark_connect_pipeline](spark_connect_pipeline.md#create_dataflow_graph)|
28
28
|`DROP_DATAFLOW_GRAPH`|[Drops a pipeline](#DROP_DATAFLOW_GRAPH)||
29
-
|`DEFINE_DATASET`|[Defines a dataset](#DEFINE_DATASET)|[SparkConnectGraphElementRegistry](SparkConnectGraphElementRegistry.md#register_dataset)|
29
+
|`DEFINE_OUTPUT`|[Defines an output](#DEFINE_OUTPUT) (a table, a materialized view, a temporary view or a sink) |[SparkConnectGraphElementRegistry](SparkConnectGraphElementRegistry.md#register_output)|
30
30
|`DEFINE_FLOW`|[Defines a flow](#DEFINE_FLOW)|[SparkConnectGraphElementRegistry](SparkConnectGraphElementRegistry.md#register_flow)|
31
31
|`START_RUN`|[Starts a pipeline run](#START_RUN)|[pyspark.pipelines.spark_connect_pipeline](spark_connect_pipeline.md#start_run)|
`startRun` is used when `PipelinesHandler` is requested to handle [proto.PipelineCommand.CommandTypeCase.START_RUN](#START_RUN) command.
98
+
??? note "START_RUN Pipeline Command"
99
+
`startRun` is used to handle [START_RUN](#START_RUN) pipeline command.
99
100
100
101
`startRun` finds the [GraphRegistrationContext](GraphRegistrationContext.md) by `dataflowGraphId` in the [DataflowGraphRegistry](DataflowGraphRegistry.md) (in the given `SessionHolder`).
101
102
@@ -113,6 +114,9 @@ createDataflowGraph(
113
114
spark: SparkSession):String
114
115
```
115
116
117
+
??? note "CREATE_DATAFLOW_GRAPH Pipeline Command"
118
+
`createDataflowGraph` is used to handle [CREATE_DATAFLOW_GRAPH](#CREATE_DATAFLOW_GRAPH) pipeline command.
119
+
116
120
`createDataflowGraph` gets the catalog (from the given `CreateDataflowGraph` if defined in the [pipeline specification file](index.md#pipeline-specification-file)) or prints out the following INFO message to the logs and uses the current catalog instead.
117
121
118
122
```text
@@ -127,40 +131,48 @@ No default database was supplied. Falling back to the current database: [current
127
131
128
132
In the end, `createDataflowGraph`[creates a dataflow graph](DataflowGraphRegistry.md#createDataflowGraph) (in the session's [DataflowGraphRegistry](DataflowGraphRegistry.md)).
`defineSqlGraphElements` is used to handle [DEFINE_SQL_GRAPH_ELEMENTS](#DEFINE_SQL_GRAPH_ELEMENTS) pipeline command.
144
+
138
145
`defineSqlGraphElements`[looks up the GraphRegistrationContext for the dataflow graph ID](DataflowGraphRegistry.md#getDataflowGraphOrThrow) (from the given `DefineSqlGraphElements` command and in the given `SessionHolder`).
139
146
140
147
`defineSqlGraphElements` creates a new [SqlGraphRegistrationContext](SqlGraphRegistrationContext.md) (for the `GraphRegistrationContext`) to [process the SQL definition file](SqlGraphRegistrationContext.md#processSqlFile).
141
148
142
-
## Define Dataset (Table or View) { #defineDataset }
149
+
## Define Output { #defineOutput }
143
150
144
151
```scala
145
-
defineDataset(
146
-
dataset: proto.PipelineCommand.DefineDataset,
147
-
sparkSession: SparkSession):Unit
152
+
defineOutput(
153
+
output: proto.PipelineCommand.DefineOutput,
154
+
sessionHolder: SessionHolder):TableIdentifier
148
155
```
149
156
150
-
`defineDataset` looks up the [GraphRegistrationContext](DataflowGraphRegistry.md#getDataflowGraphOrThrow) for the given `dataset` (or throws a `SparkException` if not found).
157
+
??? note "DEFINE_OUTPUT Pipeline Command"
158
+
`defineOutput` is used to handle [DEFINE_OUTPUT](#DEFINE_OUTPUT) pipeline command.
159
+
160
+
`defineOutput` looks up the [GraphRegistrationContext](DataflowGraphRegistry.md#getDataflowGraphOrThrow) for the dataflow graph ID of the given `output` (or throws a `SparkException` if not found).
151
161
152
-
`defineDataset` branches off based on the `dataset` type:
162
+
`defineOutput` branches off based on the `output` type:
153
163
154
164
| Dataset Type | Action |
155
165
|--------------|--------|
156
166
|`MATERIALIZED_VIEW` or `TABLE`|[Registers a table](GraphRegistrationContext.md#registerTable)|
157
167
|`TEMPORARY_VIEW`|[Registers a view](GraphRegistrationContext.md#registerView)|
168
+
|`SINK`|[Registers a sink](GraphRegistrationContext.md#registerSink)|
158
169
159
-
For unknown types, `defineDataset` reports an `IllegalArgumentException`:
170
+
??? warning "IllegalArgumentException"
171
+
For unknown types, `defineOutput` reports an `IllegalArgumentException`:
160
172
161
-
```text
162
-
Unknown dataset type: [type]
163
-
```
173
+
```text
174
+
Unknown output type: [type]
175
+
```
164
176
165
177
## Define Flow { #defineFlow }
166
178
@@ -172,7 +184,7 @@ defineFlow(
172
184
```
173
185
174
186
??? note "DEFINE_FLOW Pipeline Command"
175
-
`defineFlow` is used to handle [DEFINE_FLOW](#DEFINE_FLOW).
187
+
`defineFlow` is used to handle [DEFINE_FLOW](#DEFINE_FLOW) pipeline command.
176
188
177
189
`defineFlow` looks up the [GraphRegistrationContext](DataflowGraphRegistry.md#getDataflowGraphOrThrow) for the given `flow` (or throws a `SparkException` if not found).
178
190
@@ -185,7 +197,7 @@ defineFlow(
185
197
186
198
`defineFlow`[creates a flow identifier](GraphIdentifierManager.md#parseTableIdentifier) (for the `flow` name).
187
199
188
-
??? note "AnalysisException"
200
+
??? warning "AnalysisException"
189
201
`defineFlow` reports an `AnalysisException` if the given `flow` is not an implicit flow, but is defined with a multi-part identifier.
190
202
191
203
In the end, `defineFlow`[registers a flow](GraphRegistrationContext.md#registerFlow) (with a proper [FlowFunction](FlowAnalysis.md#createFlowFunctionFromLogicalPlan)).
`Sink` is an [extension](#contract) of the [GraphElement](GraphElement.md) and [Output](Output.md) abstractions for [pipeline sinks](#implementations) that can define their [write format](#format) and [options](#options).
4
+
5
+
## Contract
6
+
7
+
### Format { #format }
8
+
9
+
```scala
10
+
format:String
11
+
```
12
+
13
+
Used when:
14
+
15
+
*`PipelinesHandler` is requested to [define a sink (output)](PipelinesHandler.md#defineOutput)
16
+
*`SinkWrite` is requested to [start a stream](SinkWrite.md#startStream)
17
+
18
+
### Options { #options }
19
+
20
+
```scala
21
+
options:Map[String, String]
22
+
```
23
+
24
+
Used when:
25
+
26
+
*`PipelinesHandler` is requested to [define a sink (output)](PipelinesHandler.md#defineOutput)
27
+
*`SinkWrite` is requested to [start a stream](SinkWrite.md#startStream)
0 commit comments