Azure AI Content Understanding Samples (Python)

Welcome! Content Understanding is a solution that analyzes and comprehends various media content, such as documents, images, audio, and video, transforming it into structured, organized, and searchable data.

The samples in this repository default to the latest preview API version: 2025-05-01-preview.
This repo will provide more samples for new functionalities in Preview.2 2025-05-01-preview soon.
As of 2025/05, 2025-05-01-preview is only available in the regions documented in Content Understanding region and language support.
To access the sample code for version 2024-12-01-preview, please check out the corresponding Git tag 2024-12-01-preview, or download it directly from the release page.

👉 If you are looking for .NET samples, check out this repo

Getting started

You can run the sample in GitHub Codespaces or in your local environment. For a smoother, hassle-free experience, we recommend starting with Codespaces.

GitHub Codespaces

You can run this repo virtually by using GitHub Codespaces, which will open a web-based VS Code in your browser.

Once you click the link above, please follow the steps below to set up the Codespace.

Create a new Codespace by selecting the main branch, your preferred region for the Codespace, and the 2-core machine type, as shown in the screenshot below.
Once the Codespace is ready, open the terminal and follow the instructions in the "Configure Azure AI service resource" section to set up a valid Content Understanding resource.

Local environment

Make sure the following tools are installed:
Make a new directory called azure-ai-content-understanding-python and clone this template into it using the azd CLI:
```
azd init -t azure-ai-content-understanding-python
```
You can also use git to clone the repository if you prefer.
```
git clone https://github.com/Azure-Samples/azure-ai-content-understanding-python.git
cd azure-ai-content-understanding-python
```
- Important: If you use git clone, make sure to install Git LFS and run git lfs pull to download sample files in the data directory:
```
git lfs install
git lfs pull
```
Set Dev container Environment

Install tools that support dev containers
- Visual Studio Code
  Download and install Visual Studio Code.
- Dev Containers Extension
  In the VS Code extension marketplace, install the extension named "Dev Containers".
  (The extension was previously called "Remote - Containers", but has since been renamed and integrated into Dev Containers.)
- Docker
  Install Docker Desktop (available for Windows, macOS, and Linux).
  Docker is used to manage and run the container environment.
  - Start Docker and ensure it is running in the background.
Open the project and start the Dev Container
- Open the project folder with VS Code.
- Press F1 or Ctrl+Shift+P, type and select:
```
Dev Containers: Reopen in Container
```
  Or click the green icon in the lower left corner of VS Code and select "Reopen in Container".
- VS Code will automatically detect the .devcontainer folder, build the development container, and install the necessary dependencies.

Configure Azure AI service resource

(Option 1) Use `azd` commands to auto create temporal resources to run sample

Make sure you have permission to grant roles under subscription
Login Azure
```
azd auth login
```
If the previous command doesn’t work, try the following one and follow the on-screen instructions.
```
azd auth login --use-device-code
```
Setting up environment, following prompts to choose location
```
azd up
```

(Option 2) Manually create resources and set environment variables

Create Azure AI Services resource
Go to Access Control (IAM) in resource, grant yourself role Cognitive Services User
- It's necessary even you are the owner of the resource
Copy notebooks/.env.sample to notebooks/.env
Fill AZURE_AI_ENDPOINT with the endpoint from your Azure portal Azure AI Services instance.
Login Azure
```
azd auth login
```

(Option 3) Use Endpoint and Key (No `azd` Required)

⚠️ Note: Using a subscription key works, but using a token provider with Azure Active Directory (AAD) is much safer and is highly recommended for production environments.

Create Azure AI Services resource
Copy notebooks/.env.sample to notebooks/.env
```
cp notebooks/.env.sample notebooks/.env
```
Update .env with your credentials
- Edit notebooks/.env and set the following values:
```
AZURE_AI_ENDPOINT=https://<your-resource-name>.services.ai.azure.com/
AZURE_AI_API_KEY=<your-azure-ai-api-key>
```
- Replace and with your actual values. You can find them in your AI Services resource under Resource Management/Keys and Endpoint

Open a Jupyter notebook and follow the step-by-step guidance

Navigate to the notebooks directory and select the sample notebook you are interested in. Since the Dev Container (in Codespaces or in your local enviornment) is pre-configured with the necessary environment, you can directly execute each step in the notebook.

Select one of the notebooks of interest in the notebooks/ directory. We recommend you to start with "content_extraction.ipynb" to understand the basic concepts.
Select Kernel
Select Python Environment
Run

Features

Azure AI Content Understanding is a new Generative AI-based Azure AI service, designed to process/ingest content of any type (documents, images, audio, and video) into a user-defined output format. Content Understanding offers a streamlined process to reason over large amounts of unstructured data, accelerating time-to-value by generating an output that can be integrated into automation and analytical workflows.

Samples

File	Description
content_extraction.ipynb	In this sample we will show content understanding API can help you get semantic information from your file. For example OCR with table in document, audio transcription, and face analysis in video.
field_extraction.ipynb	In this sample we will show how to create an analyzer to extract fields in your file. For example invoice amount in the document, how many people in an image, names mentioned in an audio, or summary of a video. You can customize the fields by creating your own analyzer template.
field_extraction_pro_mode.ipynb	In this sample we will demonstrate how to use Pro mode in Azure AI Content Understanding to enhance your analyzer with multiple inputs and optional reference data. Pro mode is designed for advanced use cases, particularly those requiring multi-step reasoning, and complex decision-making (for instance, identifying inconsistencies, drawing inferences, and making sophisticated decisions).
classifier.ipynb	This sample will demo how to (1) create a classifier to categorize documents, (2) create a custom analyzer to extract specific fields, and (3) combine classifier and analyzers to classify, optionally split, and analyze documents in a flexible processing pipeline.
conversational_field_extraction.ipynb	This sample shows you how to evaluate conversational audio data that has previously been transcribed with Content Understanding or Azure AI Speech in in an efficient way to optimize processing quality. This also allows you to re-analyze data in a cost-efficient way. This sample is based on the field_extraction.ipynb sample.
analyzer_training.ipynb	If you want to futher boost the performance for field extraction, we can do training when you provide few labeled samples to the API. Note: This feature is available to document scenario now.
management.ipynb	This sample will demo how to create a minimal analyzer, list all the analyzers in your resource, and delete the analyzer you don't need.
build_person_directory.ipynb	This sample will demo how to enroll people’s faces from images and build a Person Directory.

Notes

Trademarks - This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos is subject to those third-party’s policies.
Data Collection - The software may collect information about you and your use of the software and send it to Microsoft. Microsoft may use this information to provide services and improve our products and services. You may turn off the telemetry as described in the repository. There are also some features in the software that may enable you and Microsoft to collect data from users of your applications. If you use these features, you must comply with applicable law, including providing appropriate notices to users of your applications together with a copy of Microsoft’s privacy statement. Our privacy statement is located at https://go.microsoft.com/fwlink/?LinkID=824704. You can learn more about data collection and use in the help documentation and our privacy statement. Your use of the software operates as your consent to these practices.

Name		Name	Last commit message	Last commit date
Latest commit History 109 Commits
.devcontainer		.devcontainer
.github		.github
analyzer_templates		analyzer_templates
data		data
docs		docs
infra		infra
notebooks		notebooks
python		python
tools		tools
.gitattributes		.gitattributes
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
azure.yaml		azure.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Azure AI Content Understanding Samples (Python)

Getting started

GitHub Codespaces

Local environment

Configure Azure AI service resource

(Option 1) Use `azd` commands to auto create temporal resources to run sample

(Option 2) Manually create resources and set environment variables

(Option 3) Use Endpoint and Key (No `azd` Required)

Open a Jupyter notebook and follow the step-by-step guidance

Features

Samples

More Samples using Azure Content Understanding

Notes

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 17

Languages

License

Azure-Samples/azure-ai-content-understanding-python

Folders and files

Latest commit

History

Repository files navigation

Azure AI Content Understanding Samples (Python)

Getting started

GitHub Codespaces

Local environment

Configure Azure AI service resource

(Option 1) Use azd commands to auto create temporal resources to run sample

(Option 2) Manually create resources and set environment variables

(Option 3) Use Endpoint and Key (No azd Required)

Open a Jupyter notebook and follow the step-by-step guidance

Features

Samples

More Samples using Azure Content Understanding

Notes

About

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 17

Languages

(Option 1) Use `azd` commands to auto create temporal resources to run sample

(Option 3) Use Endpoint and Key (No `azd` Required)

Packages