Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/180-microsoft-integration/020-purview.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ The following table provides a list of integration features.
| Synchronize Purview Glossary Terms To CluedIn Vocabularies | Create Purview glossary terms from CluedIn vocabulary.
| Synchronize Streams | Create and update "Cluedin Entity", "Cluedin Stream Process", and "Cluedin Organization Provider" entities on Purview lineages that link CluedIn streams and connectors.
| Synchronize Crawlers And Enrichers | Create and update "Cluedin Organization Provider", "Cluedin Crawl Process", "Cluedin Enrich Process", "Cluedin Ingest Process", "Cluedin Dataset", "Cluedin Map Process", and "Cluedin Entity" entities on Purview lineages that link Purview assets to CluedIn data sets and matching CluedIn Entity Types. |
| Synchronize Data Products | Create CluedIn DataSources from Purview Data Products and Data Assets. |

We have the following **assumptions** about the customers' Microsoft Azure setup:

Expand Down
100 changes: 0 additions & 100 deletions docs/180-microsoft-integration/purview/010-intro.md

This file was deleted.

35 changes: 35 additions & 0 deletions docs/180-microsoft-integration/purview/010-setup-credentials.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
---
layout: cluedin
title: Setup credentials
parent: Microsoft Purview Integration
grand_parent: Microsoft Integration
permalink: /microsoft-integration/purview/setup-credentials
nav_order: 010
has_children: false
tags: ["integration", "microsoft", "azure", "purview", credentials]
---
## On this page
{: .no_toc .text-delta }
- TOC
{:toc}

## Credentials

To connect CluedIn to Microsoft Purview, you need to provide the Microsoft Purview account and Service Principal account information.

In CluedIn, on the navigation pane, go to **Administration** > **Settings**, and then scroll down to find the **Purview** section.

### Enter Microsoft Purview credentials:

- **Base URL** – `https://{accountName}.purview.azure.com` where `accountName` is your Purview subscription name. For example, if your Purview subscription name is ContosoPurview, then the base URL would be `https://contosopurview.purview.azure.com`. For more details about Purview subscription, see [Microsoft documentation](https://docs.microsoft.com/en-us/azure/purview/create-catalog-portal#open-the-microsoft-purview-governance-portal).
- Another way to find the Base URL value is by navigating to your Purview Account home page and viewing the **Resource JSON**
![Input Microsoft Purview credentials](./media/purview-resource-json.png)
- **Client ID** – can be obtained from **Application (client) ID** on the **Overview** tab of the app registration that the organization has dedicated to access Purview account on behalf of CluedIn.

- **Client Secret** – can be obtained from the **Certificates & secrets** of the app registration that the organization has dedicated to access Purview account on behalf of CluedIn.

- **Tenant ID** – can be obtained from **Directory (tenant) ID** on the **Overview** tab of the app registration that the organization has dedicated to access Purview account on behalf of CluedIn.

![Input Microsoft Purview credentials](./media/settings.png)

For more information on where to find the values for client ID and tenant ID, see [Microsoft documentation](https://learn.microsoft.com/en-us/entra/identity-platform/quickstart-register-app#register-an-application).
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
---
layout: cluedin
title: Sync Datasources
parent: Microsoft Purview Integration
grand_parent: Microsoft Integration
permalink: /microsoft-integration/purview/features/sync-datasources
nav_order: 010
has_children: false
tags: ["integration", "microsoft", "azure", "purview", "collection", "sync", "datasources"]
---
## On this page
{: .no_toc .text-delta }
- TOC
{:toc}

## Sync Purview Assets as CluedIn DataSources

![Settings Sync Datasources](../media/settings-sync-datasources.png)

When this feature is enabled, CluedIn fetches all Microsoft Purview asset entities from Purview to create data source groups and their respective data sources. The data source groups can be viewed in **Integrations** > **Data Sources**. The CluedIn Purview integration components create Purview assets under a single root collection.

![Sync Datasources to CluedIn](../media/sync-datasources-to-cluedin.png)

The `Keywords` used from the above setting is the Glossary Term Name which is tagged into the Assets.

![Sync Datasources Glossary Term](../media/sync-datasources-glossary-term.png)
![Sync Datasources Assets](../media/sync-datasources-assets.png)

## Dataset Lineage

CluedIn creates a lineage when one or more data sets are created within a data source associated with a Purview asset entity previously created via the **Sync CluedIn Data Sources** feature. The **Ingest Data** process displays data flow from the Microsoft Purview asset entity to a newly created or updated Microsoft Purview data set entity. The Microsoft Purview data set entity represents the CluedIn data set with its populated column names.

Data sets in CluedIn with a mapping of at least one property to a CluedIn entity type result in both the CluedIn entity and the **Map to Entity** process being created in Purview. The **Map to Entity** process connects the CluedIn data set to the CluedIn entity type under the assets lineage tab.

The following image shows an example of a data set lineage.

![Example of a Data Set lineage](../media/dataset_lineage.png)

Background processes in CluedIn detect changes in CluedIn data sets and their respective mapping. These changes are synchronized with the existing Microsoft Purview data set assets.
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
---
layout: cluedin
title: Auto-Map CluedIn DataSets
parent: Microsoft Purview Integration
grand_parent: Microsoft Integration
permalink: /microsoft-integration/purview/features/auto-map-datasets
nav_order: 020
has_children: false
tags: ["integration", "microsoft", "azure", "purview", "collection", "auto-map", "dataset"]
---
## On this page
{: .no_toc .text-delta }
- TOC
{:toc}

## Auto-Map CluedIn Data Sets

![Settings Auto-Map Datasets](../media/settings-auto-map-datasets.png)

This feature auto-maps data sets to a vocabulary matching the Purview asset's glossary term. The feature applies to data sets tied to the Purview asset and has Purview glossary terms assigned to either the Purview asset itself or the schema columns.

The Purview glossary terms used by the Purview assets must first be added as CluedIn vocabularies. If the vocabulary (made from Purview glossary terms) is available, the data set is automatically mapped to the right vocabulary.

If the CluedIn data set column name matches an existing vocabulary key of the given vocabulary, the vocabulary key data type is used instead of the Purview entity's schema. If the CluedIn data set column name does not match any existing vocabulary key of the given vocabulary, it will create a new vocabulary key for the vocabulary. The data type of the vocabulary key is determined from the Purview entity's schema.

The default data type is text when no suitable data type in CluedIn is found.

Sync interval applies to this feature.

## Requirements
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
---
layout: cluedin
title: Poll Datasources
parent: Microsoft Purview Integration
grand_parent: Microsoft Integration
permalink: /microsoft-integration/purview/features/poll-datasources
nav_order: 030
has_children: false
tags: ["integration", "microsoft", "azure", "purview", "collection", "poll", "datasources"]
---
## On this page
{: .no_toc .text-delta }
- TOC
{:toc}


## Poll CluedIn Data Sources

![Settings Poll Datasources](../media/settings-poll-datasources.png)

This feature differs from **Sync CluedIn Data Sources** by updating the existing data set entities on Microsoft Purview lineages without having to sync new data sources on Microsoft Purview back to CluedIn. Data quality metrics for the associated data sources in CluedIn are also synced back to Microsoft Purview lineages this way.
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
---
layout: cluedin
title: Sync CluedIn Crawlers and Enrichers
parent: Microsoft Purview Integration
grand_parent: Microsoft Integration
permalink: /microsoft-integration/purview/features/sync-crawlers-and-enrichers
nav_order: 040
has_children: false
tags: ["integration", "microsoft", "azure", "purview", "collection", "sync", "crawlers", "enrichers"]
---
## On this page
{: .no_toc .text-delta }
- TOC
{:toc}


## Sync CluedIn Crawlers and Enrichers

![Settings Sync Crawlers and Enrichers](../media/settings-sync-crawlers-and-enrichers.png)

This feature will create or update existing crawlers and enricher lineages in Purview. The **DataSource** provider types (data imported via files, endpoints, or databases) are handled by the **Sync CluedIn Data Sources** feature.

The following image shows an example of a crawler lineage.

![Example of a Crawler lineage](../media/crawler_lineage.png)

When a Crawler imports clues into CluedIn, this feature creates a lineage from the Crawler provider to the entity types of the CluedIn entities via the **Crawl** process.

The following image shows an example of an enricher lineage.

![Example of an Enricher lineage](../media/enricher_lineage.png)

When an enricher enriches an entity, this feature creates a lineage from the enricher provider to the entity types of the CluedIn entities via the **Enrich** process.

Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
---
layout: cluedin
title: Sync CluedIn Streams
parent: Microsoft Purview Integration
grand_parent: Microsoft Integration
permalink: /microsoft-integration/purview/features/sync-streams
nav_order: 050
has_children: false
tags: ["integration", "microsoft", "azure", "purview", "collection", "sync", "streams"]
---
## On this page
{: .no_toc .text-delta }
- TOC
{:toc}

## Sync CluedIn Streams

![Settings Sync Streams](../media/settings-sync-streams.png)

A background processing in CluedIn synchronizes streams and their respective export connector as assets in Purview. These assets show outbound lineage from CluedIn entity types to the export target.

Details of CluedIn Streams named as `Customer Golden Record` with Export Target configured with Target name `Customer`

![CluedIn Streams](../media/cluedin-stream.png)

Full Lineage of Purview Asset to CluedIn Stream

![Full Lineage](../media/sync-streams-asset-lineage.png)

CluedIn Stream created as a Process Asset and Export Target as the Entity Asset.

![Stream and Export Target Asset](../media/sync-streams-process-and-export-target.png)
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
---
layout: cluedin
title: Sync Purview Glossaries to CluedIn Vocabularies
parent: Microsoft Purview Integration
grand_parent: Microsoft Integration
permalink: /microsoft-integration/purview/features/sync-purview-glossaries-to-vocab
nav_order: 050
has_children: false
tags: ["integration", "microsoft", "azure", "purview", "collection", "sync", "glossary", "vocabulary"]
---
## On this page
{: .no_toc .text-delta }
- TOC
{:toc}

This synchronization feature allows the import of Microsoft Purview glossaries as CluedIn vocabularies. If there are matching CluedIn vocabularies, they will be updated; otherwise, new CluedIn vocabularies are created for the incoming Microsoft Purview glossaries.

## Syncronization Settings

![Settings Sync Purview Glossaries to CluedIn Vocabularies](../media/settings-sync-purview-glossaries-to-vocab.png)

#### **Sync Purview glossaries to CluedIn vocabularies**
When this feature is enabled, the job will start to look in Purview Glossaries and Create a CluedIn Vocabularies if the requirements have been met.

#### **Glossary To Vocabulary Attribute Filter**

Key Value pair of Property Name and Value to filter the Glossary Terms. `CluedInVocab` is an additional property from `CluedIn Template` Template.

![Purview Glossary Term Setup](../media/purview-glossary-term-setup.png)

#### **Glossary To Vocabulary Term Pattern**

Regex Pattern value used to filter the Glossary Terms by Name or NickName

#### **Glossary To Vocabulary Template Name Pattern**

Regex Pattern value used to filter the Glossary Terms by Template Name

#### **Vocabulary Prefix**

Assign a custom value for the Vocabulary Key prefix.

![Vocabulary Configuration Page - Prefix](../media/cluedin-vocabulary-configuration.png)

#### **EntityType Prefix**

Assign a custom value for the EntityType prefix for identifying the EntityType if it's created under Purview Integration.

![EntityType Configuration Page - Prefix](../media/cluedin-entitytype-configuration.png)

## Sample Previews
Created a Glossary Term `Customer` with Children Glossary Terms `Name`, `HouseholdId`, `Age`, `Id`, `Phone` and `Notes`

![Settings Sync Purview Glossaries to CluedIn Vocabularies](../media/purview-child-glossary-terms.png)

Assigned the Children Glossary Term to Purview Asset Schema

![Settings Sync Purview Glossaries to CluedIn Vocabularies](../media/asset-glossary-term-mapping.png)

When Synchronization is done, notice that `Customer` Vocabulary has been created with `Name` only as its Vocabulary Key.

![Settings Sync Purview Glossaries to CluedIn Vocabularies](../media/cluedin-vocab-from-glossary-terms.png)

The reason for this is that Glossary Terms `Customer` and `Name` are the only terms that have been assigned with `CluedInVocab=true` value in its property.
Loading