You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/declarative-pipelines/DatasetManager.md
+46-2Lines changed: 46 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,11 @@
1
1
# DatasetManager
2
2
3
-
!!! note "Scala object"
3
+
`DatasetManager` is a global manager to [materialize datasets](#materializeDatasets) (tables and persistent views).
4
+
5
+
!!! note ""
6
+
**Materialization** is a process of publishing tables and persistent views to session [TableCatalog](../connector/catalog/TableCatalog.md) and [SessionCatalog](../SessionCatalog.md), for tables and persistent views, respectively.
7
+
8
+
??? note "Scala object"
4
9
`DatasetManager` is an `object` in Scala which means it is a class that has exactly one instance (itself).
5
10
A Scala `object` is created lazily when it is referenced for the first time.
6
11
@@ -22,6 +27,8 @@ materializeDatasets(
22
27
23
28
For every table to be materialized, `materializeDatasets`[materializeTable](#materializeTable).
24
29
30
+
In the end, `materializeDatasets`[materializeViews](#materializeViews).
31
+
25
32
---
26
33
27
34
`materializeDatasets` is used when:
@@ -52,12 +59,49 @@ Materializing metadata for table [identifier].
52
59
53
60
For an existing table, `materializeTable` wipes data out (`TRUNCATE TABLE`) if it is`isFullRefresh` or the table is not [streaming](Table.md#isStreamingTable).
54
61
55
-
For an existing table, `materializeTable` requests the `TableCatalog` to [alterTable](../connector/catalog/TableCatalog.md#alterTable) if there are any changes in the schema or table properties.
62
+
For an existing table, `materializeTable` requests the `TableCatalog` to [alter the table](../connector/catalog/TableCatalog.md#alterTable) if there are any changes in the schema or table properties.
56
63
57
64
Unless created already, `materializeTable` requests the `TableCatalog` to [create the table](../connector/catalog/TableCatalog.md#createTable).
58
65
59
66
In the end, `materializeTable` requests the `TableCatalog` to [load the materialized table](../connector/catalog/TableCatalog.md#loadTable) and returns the given [Table](Table.md) back (with the [normalized table storage path](Table.md#normalizedPath) updated to the `location` property of the materialized table).
`materializeViews` requests the given [DataflowGraph](DataflowGraph.md) for the [persisted views](DataflowGraph.md#persistedViews) to materialize (_publish or refresh_).
77
+
78
+
!!! note "Publish (Materialize) Views"
79
+
To publish a view, it is required that all the input sources must exist in the metastore.
80
+
If a Persisted View target reads another Persisted View source, the source must be published first.
81
+
82
+
`materializeViews`...FIXME
83
+
84
+
For each view to be persisted (with no pending inputs), `materializeViews`[materialize the view](#materializeView).
85
+
86
+
#### Materialize View { #materializeView }
87
+
88
+
```scala
89
+
materializeView(
90
+
view: View,
91
+
flow: ResolvedFlow,
92
+
spark: SparkSession):Unit
93
+
```
94
+
95
+
`materializeView`[executes a CreateViewCommand logical command](../logical-operators/CreateViewCommand.md).
96
+
97
+
---
98
+
99
+
`materializeView` creates a [CreateViewCommand](../logical-operators/CreateViewCommand.md) logical command (as a `PersistedView` with `allowExisting` and `replace` flags enabled).
100
+
101
+
`materializeView` requests the given [ResolvedFlow](ResolvedFlow.md) for the [QueryContext](ResolutionCompletedFlow.md#queryContext) to set the current catalog and current database, if defined, in the session [CatalogManager](../connector/catalog/CatalogManager.md).
102
+
103
+
In the end, `materializeView`[executes the CreateViewCommand](../logical-operators/CreateViewCommand.md#run).
0 commit comments