Skip to content

Commit 095cdb5

Browse files
[SDP] Processing SQL libraries + ONCE flows
1 parent e26f7bb commit 095cdb5

File tree

8 files changed

+83
-11
lines changed

8 files changed

+83
-11
lines changed

docs/declarative-pipelines/Flow.md

Lines changed: 33 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,35 @@
11
# Flow
22

3-
`Flow` is...FIXME
3+
`Flow` is an [extension](#contract) of the [GraphElement](GraphElement.md) abstraction for [flows](#implementations) in dataflow graphs.
4+
5+
Flows must be successfully analyzed, thus resolved, in order to determine whether they are streaming or not.
6+
7+
## Contract (Subset)
8+
9+
### once { #once }
10+
11+
```scala
12+
once: Boolean
13+
```
14+
15+
Indicates whether this is a **ONCE flow** or not. ONCE flows can only be run once per full refresh.
16+
17+
* ONCE flows are planned for execution as [AppendOnceFlow](AppendOnceFlow.md)s ([FlowResolver](FlowResolver.md#convertResolvedToTypedFlow))
18+
* ONCE flows are marked as IDLE when `TriggeredGraphExecution` is requested to start flows in [topologicalExecution](TriggeredGraphExecution.md#topologicalExecution).
19+
* [ONCE flows must be batch (not streaming)](GraphValidations.md#validateFlowStreamingness).
20+
* For ONCE flows or when the [logical plan for the flow](ResolvedFlow.md#df) is streaming, `GraphElementTypeUtils` considers a [ResolvedFlow](ResolvedFlow.md) as a `STREAMING_TABLE` (in [getDatasetTypeForMaterializedViewOrStreamingTable](GraphElementTypeUtils.md#getDatasetTypeForMaterializedViewOrStreamingTable)).
21+
22+
Default: `false`
23+
24+
Used when:
25+
26+
* `TriggeredGraphExecution` is requested to [topologicalExecution](TriggeredGraphExecution.md#topologicalExecution)
27+
* `PipelinesErrors` is requested to [checkStreamingErrorsAndRetry](PipelinesErrors.md#checkStreamingErrorsAndRetry) (to skip ONCE flows with no exception)
28+
* `GraphValidations` is requested to [validateFlowStreamingness](GraphValidations.md#validateFlowStreamingness)
29+
* `GraphElementTypeUtils` is requested to [getDatasetTypeForMaterializedViewOrStreamingTable](GraphElementTypeUtils.md#getDatasetTypeForMaterializedViewOrStreamingTable)
30+
* `FlowResolver` is requested to [convertResolvedToTypedFlow](FlowResolver.md#convertResolvedToTypedFlow)
31+
32+
## Implementations
33+
34+
* [ResolutionCompletedFlow](ResolutionCompletedFlow.md)
35+
* [UnresolvedFlow](UnresolvedFlow.md)
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# GraphElement
2+
3+
`GraphElement` is...FIXME
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# GraphElementTypeUtils
2+
3+
## getDatasetTypeForMaterializedViewOrStreamingTable { #getDatasetTypeForMaterializedViewOrStreamingTable }
4+
5+
```scala
6+
getDatasetTypeForMaterializedViewOrStreamingTable(
7+
flowsToTable: Seq[ResolvedFlow]): DatasetType
8+
```
9+
10+
`getDatasetTypeForMaterializedViewOrStreamingTable`...FIXME
11+
12+
---
13+
14+
`getDatasetTypeForMaterializedViewOrStreamingTable` is used when:
15+
16+
* `GraphValidations` is requested to [validateUserSpecifiedSchemas](GraphValidations.md#validateUserSpecifiedSchemas)
17+
* `SchemaInferenceUtils` is requested to [inferSchemaFromFlows](SchemaInferenceUtils.md#inferSchemaFromFlows)
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# GraphValidations
2+
3+
`GraphValidations` is...FIXME
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# PipelinesErrors
2+
3+
`PipelinesErrors` is...FIXME
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# SchemaInferenceUtils
2+
3+
`SchemaInferenceUtils` is...FIXME

docs/declarative-pipelines/SqlGraphRegistrationContext.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,13 +9,13 @@
99
`SqlGraphRegistrationContext` is created when:
1010

1111
* `PipelinesHandler` is requested to [handle DEFINE_SQL_GRAPH_ELEMENTS command](PipelinesHandler.md#handlePipelinesCommand) (and [defineSqlGraphElements](PipelinesHandler.md#defineSqlGraphElements))
12-
* `SqlGraphRegistrationContext` is requested to [processSqlFile](#processSqlFile)
12+
* `SqlGraphRegistrationContext` is requested to [process a SQL file](#processSqlFile)
1313

1414
## SqlGraphRegistrationContextState { #context }
1515

1616
When [created](#creating-instance), `SqlGraphRegistrationContext` creates a [SqlGraphRegistrationContextState](SqlGraphRegistrationContextState.md) (with the [defaultCatalog](GraphRegistrationContext.md#defaultCatalog), the [defaultDatabase](GraphRegistrationContext.md#defaultDatabase) and the [defaultSqlConf](GraphRegistrationContext.md#defaultSqlConf)).
1717

18-
## Process SQL Definition File { #processSqlFile }
18+
## Process SQL File { #processSqlFile }
1919

2020
```scala
2121
processSqlFile(

docs/declarative-pipelines/UnresolvedFlow.md

Lines changed: 19 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,14 @@
1-
---
2-
hide:
3-
- toc
4-
---
5-
61
# UnresolvedFlow
72

8-
`UnresolvedFlow` is a [Flow](Flow.md) that represents flows in the following Python and SQL transformations in [Spark Declarative Pipelines](index.md):
3+
`UnresolvedFlow` is a [Flow](Flow.md) that represents a flow in the Python and SQL transformations in [Spark Declarative Pipelines](index.md):
94

105
* [register_flow](GraphElementRegistry.md#register_flow) in PySpark's decorators
116
* [CREATE FLOW ... AS INSERT INTO ... BY NAME](../logical-operators/CreateFlowCommand.md)
127
* [CREATE MATERIALIZED VIEW](../logical-operators/CreateMaterializedViewAsSelect.md)
138
* [CREATE STREAMING TABLE ... AS](../logical-operators/CreateStreamingTableAsSelect.md)
149
* [CREATE VIEW](../logical-operators/CreateView.md) and the other variants of [CREATE VIEW](../logical-operators/CreateViewCommand.md)
1510

16-
`UnresolvedFlow` is registered to a [GraphRegistrationContext](GraphRegistrationContext.md) with [registerFlow](GraphRegistrationContext.md#registerFlow).
11+
`UnresolvedFlow` is registered to a [GraphRegistrationContext](GraphRegistrationContext.md) with [register a flow](GraphRegistrationContext.md#registerFlow).
1712

1813
`UnresolvedFlow` is analyzed and resolved to [ResolvedFlow](ResolvedFlow.md) (by [FlowResolver](FlowResolver.md#attemptResolveFlow) when [DataflowGraph](DataflowGraph.md) is requested to [resolve](DataflowGraph.md#resolve)).
1914

@@ -28,7 +23,7 @@ hide:
2823
* <span id="func"> `FlowFunction`
2924
* <span id="queryContext"> `QueryContext`
3025
* <span id="sqlConf"> SQL Config
31-
* <span id="once"> `once` flag
26+
* [once](#once) flag
3227
* <span id="origin"> `QueryOrigin`
3328

3429
`UnresolvedFlow` is created when:
@@ -40,3 +35,19 @@ hide:
4035
* [CreateView](SqlGraphRegistrationContext.md#CreateView)
4136
* [CreateStreamingTableAsSelect](SqlGraphRegistrationContext.md#CreateStreamingTableAsSelect)
4237
* [CreateViewCommand](SqlGraphRegistrationContext.md#CreateViewCommand)
38+
39+
### once Flag { #once }
40+
41+
`UnresolvedFlow` is given the [once](Flow.md#once) flag when [created](#creating-instance).
42+
43+
`once` flag is disabled (`false`) explicitly for the following:
44+
45+
* [CreateFlowHandler](SqlGraphRegistrationContext.md#CreateFlowHandler)
46+
* [CreateMaterializedViewAsSelectHandler](SqlGraphRegistrationContext.md#CreateMaterializedViewAsSelectHandler)
47+
* [CreatePersistedViewCommandHandler](SqlGraphRegistrationContext.md#CreatePersistedViewCommandHandler)
48+
* [CreateStreamingTableAsSelectHandler](SqlGraphRegistrationContext.md#CreateStreamingTableAsSelectHandler)
49+
* [CreateTemporaryViewHandler](SqlGraphRegistrationContext.md#CreateTemporaryViewHandler)
50+
* `PipelinesHandler` is requested to [define a flow](PipelinesHandler.md#defineFlow)
51+
52+
!!! note "No ONCE UnresolvedFlows"
53+
It turns out that all `UnresolvedFlow`s created are not [ONCE flows](Flow.md#once).

0 commit comments

Comments
 (0)