Skip to content

CluedIn-io/CluedIn.Connector.AzureDataLake

Repository files navigation

CluedIn.Connector.AzureDataLake

About CluedIn

CluedIn is the Cloud-native Master Data Management Platform that brings data teams together enabling them to deliver the foundation of high-quality, trusted data that empowers everyone to make a difference.

We're different because we use enhanced data management techniques like Graph and Zero Upfront Modelling to accelerate the time taken to prepare data to deliver insight by as much as 80%. Installed in as little as 20 minutes from the Azure Marketplace, CluedIn is fully integrated with Microsoft Purview and the full Microsoft Fabric suite, making it the preferred choice for Azure customers.

To learn more about CluedIn, contact the team today.

https://www.cluedin.com

Development

Parquet File Output

Microsoft support for Parquet format varies across products:

  1. GUID. One Lake and Open Mirroring doesn't support GUID type. It has to be serialized as a string
  2. Array of Strings.
    1. One Lake will not have an error but preview and SQL Analytics endpoint doesn't work
    2. Open Mirroring will have an error. Thus it cannot be added to the export
    3. For One Lake and Azure Data Lake, extra columns are added to facilitate usecases where the values are needed. It is in the form of X_String
  3. Characters in columns. One Lake and Open Mirroring only support alphanumeric and space. No other characters are supported, including dot for vocabulary keys. Thus they have to be escaped.

The table below shows what the connectors will output.

Item Array of String JSON String GUID as String Escape Vocabulary Keys
Azure Data Lake YES YES OPTIONAL OPTIONAL
One Lake YES YES YES YES
Open Mirroring NO YES YES YES