Skip to content

Commit b8a2c89

Browse files
[SDP] DatasetManager and (table and persistent view) materialization
1 parent 1fd1133 commit b8a2c89

File tree

1 file changed

+46
-2
lines changed

1 file changed

+46
-2
lines changed

docs/declarative-pipelines/DatasetManager.md

Lines changed: 46 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,11 @@
11
# DatasetManager
22

3-
!!! note "Scala object"
3+
`DatasetManager` is a global manager to [materialize datasets](#materializeDatasets) (tables and persistent views).
4+
5+
!!! note ""
6+
**Materialization** is a process of publishing tables and persistent views to session [TableCatalog](../connector/catalog/TableCatalog.md) and [SessionCatalog](../SessionCatalog.md), for tables and persistent views, respectively.
7+
8+
??? note "Scala object"
49
`DatasetManager` is an `object` in Scala which means it is a class that has exactly one instance (itself).
510
A Scala `object` is created lazily when it is referenced for the first time.
611

@@ -22,6 +27,8 @@ materializeDatasets(
2227

2328
For every table to be materialized, `materializeDatasets` [materializeTable](#materializeTable).
2429

30+
In the end, `materializeDatasets` [materializeViews](#materializeViews).
31+
2532
---
2633

2734
`materializeDatasets` is used when:
@@ -52,12 +59,49 @@ Materializing metadata for table [identifier].
5259

5360
For an existing table, `materializeTable` wipes data out (`TRUNCATE TABLE`) if it is`isFullRefresh` or the table is not [streaming](Table.md#isStreamingTable).
5461

55-
For an existing table, `materializeTable` requests the `TableCatalog` to [alterTable](../connector/catalog/TableCatalog.md#alterTable) if there are any changes in the schema or table properties.
62+
For an existing table, `materializeTable` requests the `TableCatalog` to [alter the table](../connector/catalog/TableCatalog.md#alterTable) if there are any changes in the schema or table properties.
5663

5764
Unless created already, `materializeTable` requests the `TableCatalog` to [create the table](../connector/catalog/TableCatalog.md#createTable).
5865

5966
In the end, `materializeTable` requests the `TableCatalog` to [load the materialized table](../connector/catalog/TableCatalog.md#loadTable) and returns the given [Table](Table.md) back (with the [normalized table storage path](Table.md#normalizedPath) updated to the `location` property of the materialized table).
6067

68+
### Materialize Views { #materializeViews }
69+
70+
```scala
71+
materializeViews(
72+
virtualizedConnectedGraphWithTables: DataflowGraph,
73+
context: PipelineUpdateContext): Unit
74+
```
75+
76+
`materializeViews` requests the given [DataflowGraph](DataflowGraph.md) for the [persisted views](DataflowGraph.md#persistedViews) to materialize (_publish or refresh_).
77+
78+
!!! note "Publish (Materialize) Views"
79+
To publish a view, it is required that all the input sources must exist in the metastore.
80+
If a Persisted View target reads another Persisted View source, the source must be published first.
81+
82+
`materializeViews`...FIXME
83+
84+
For each view to be persisted (with no pending inputs), `materializeViews` [materialize the view](#materializeView).
85+
86+
#### Materialize View { #materializeView }
87+
88+
```scala
89+
materializeView(
90+
view: View,
91+
flow: ResolvedFlow,
92+
spark: SparkSession): Unit
93+
```
94+
95+
`materializeView` [executes a CreateViewCommand logical command](../logical-operators/CreateViewCommand.md).
96+
97+
---
98+
99+
`materializeView` creates a [CreateViewCommand](../logical-operators/CreateViewCommand.md) logical command (as a `PersistedView` with `allowExisting` and `replace` flags enabled).
100+
101+
`materializeView` requests the given [ResolvedFlow](ResolvedFlow.md) for the [QueryContext](ResolutionCompletedFlow.md#queryContext) to set the current catalog and current database, if defined, in the session [CatalogManager](../connector/catalog/CatalogManager.md).
102+
103+
In the end, `materializeView` [executes the CreateViewCommand](../logical-operators/CreateViewCommand.md#run).
104+
61105
## constructFullRefreshSet { #constructFullRefreshSet }
62106

63107
```scala

0 commit comments

Comments
 (0)