feat: Add Azure OpenAI language model support and enhance README #38

JustStas · 2025-08-04T12:31:31Z

Description

Follows @HarvinderBhullar proposition in #49

Implement AzureOpenAILanguageModel inheriting from OpenAILanguageModel
Reuses all inference logic, only overrides client initialization
Supports azure: and aoai: model_id prefixes (e.g., azure:gpt-4o)
Accepts azure_endpoint, api_key, and api_version as parameters
Auto-registered with priority 20 via registry decorator
Add Azure OpenAI to provider documentation and examples
Update .gitignore for cleaner temp file handling

(Feature)
Fixes #49

How Has This Been Tested?

import os
import textwrap

import langextract as lx


def main() -> None:
  prompt = textwrap.dedent("""
      Extract main characters and brief emotions from the text.
      Use exact spans from the text for extraction_text.
      """)

  examples = [
      lx.data.ExampleData(
          text=(
              "ROMEO. But soft! What light through yonder window breaks? "
              "It is the east, and Juliet is the sun."
          ),
          extractions=[
              lx.data.Extraction(
                  extraction_class="character", extraction_text="ROMEO"
              ),
              lx.data.Extraction(
                  extraction_class="emotion",
                  extraction_text="But soft!",
                  attributes={"feeling": "awe"},
              ),
          ],
      )
  ]

  input_text = (
      "Lady Juliet gazed longingly at the stars, her heart aching for Romeo"
  )

  # Built-in provider approach: Use lx.extract with azure: prefix and language_model_params
  result = lx.extract(
      text_or_documents=input_text,
      prompt_description=prompt,
      examples=examples,
      model_id="azure:gpt-5",  # Prefix selects Azure provider, deployment=gpt-4o
      fence_output=True,  # Required for Azure OpenAI (like OpenAI)
      use_schema_constraints=False,  # Required for Azure OpenAI (like OpenAI)
      language_model_params={
          "azure_endpoint": os.environ["AZURE_OPENAI_ENDPOINT"],
          "api_key": os.environ["AZURE_OPENAI_API_KEY"],
          "api_version": "2024-07-01-preview",
      },
  )

  print(result)

Checklist:

I have read and acknowledged Google's Open Source
Code of conduct.
I have read the
Contributing
page, and I either signed the Google
Individual CLA
or am covered by my company's
Corporate CLA.
I have discussed my proposed solution with code owners in the linked
issue(s) and we have agreed upon the general approach.
I have made any needed documentation changes, or noted in the linked
issue(s) that documentation elsewhere needs updating.
I have added tests, or I have ensured existing tests cover the changes
I have followed
Google's Python Style Guide
and ran pylint over the affected code.

- Switch from badge.fury.io to shields.io for working PyPI badge - Convert relative paths to absolute GitHub URLs for PyPI compatibility - Bump version to 0.1.3

- Add GitHub Actions workflow for automated PyPI publishing via OIDC - Configure trusted publishing environment for verified releases - Update project metadata with proper URLs and license format - Prepare for v1.0.0 stable release with production-ready automation

- Add pylibmagic>=0.5.0 dependency for bundled libraries - Add [full] install option and pre-import handling - Update README with troubleshooting and Docker sections - Bump version to 1.0.1 Fixes google#6

Deleted an inline comment referencing the output directory in the save_annotated_documents.

…ples.md docs: clarify output_dir behavior in medication_examples.md

Prevents confusion from default `test_output/...` by explicitly saving to current directory.

docs: add output_dir="." to all save_annotated_documents examples

feat: add code formatting and linting pipeline

Introduces a common base exception class that all library-specific exceptions inherit from, enabling users to catch all LangExtract errors with a single except clause.

Add LangExtractError base exception for centralized error handling

Fixes google#25 - Windows installation failure due to pylibmagic build requirements Breaking change: LangFunLanguageModel removed. Use GeminiLanguageModel or OllamaLanguageModel instead.

fix: Remove LangFun and pylibmagic dependencies to fix Windows installation and OpenAI SDK v1.x compatibility

- Modified save_annotated_documents to accept both pathlib.Path and string paths - Convert string paths to Path objects before calling mkdir() - This fixes the error when using output_dir='.' as shown in the README example

…-mkdir Fix save_annotated_documents to handle string paths

feat: Add OpenAI language model support

…s: (google#10) * docs: clarify output_dir behavior in medication_examples.md * Removed inline comment in medication example Deleted an inline comment referencing the output directory in the save_annotated_documents. * docs: add output_dir="." to all save_annotated_documents examples Prevents confusion from default `test_output/...` by explicitly saving to current directory. * build: add formatting & linting pipeline with pre-commit integration * style: apply pyink, isort, and pre-commit formatting * ci: enable format and lint checks in tox * Add LangExtractError base exception for centralized error handling Introduces a common base exception class that all library-specific exceptions inherit from, enabling users to catch all LangExtract errors with a single except clause. * fix(ui): prevent current highlight border from being obscured --------- Co-authored-by: Leena Kamran <[email protected]> Co-authored-by: Akshay Goel <[email protected]>

google-cla · 2025-08-04T12:31:36Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

- Gemini & OpenAI test suites with retry on transient errors - CI: Separate job, Python 3.11 only, skips for forks - Validates char_interval for all extractions - Multilingual test xfail (issue google#13) TODO: Remove xfail from multilingual test after tokenizer fix

…oogle#57) Fixes google#27

…e#62) - Add quickstart example and documentation for local LLM usage - Include Docker setup with health checks and docker-compose - Add integration tests and update CI pipeline - Secure setup: localhost-only binding, containerized deployment Signed-off-by: Akshay Goel <[email protected]>

- Ollama integration with Docker examples - Fixed OllamaLanguageModel parameter name (model -> model_id) - Added CI/CD tests for Ollama - Updated documentation with consistent API examples

Auto-updates PRs behind main, handles forks/conflicts gracefully, skips bot/draft PRs, monitors API limits

github-actions · 2025-08-07T05:29:29Z

⚠️ Branch Update Required

Your branch is 7 commits behind main. Please update your branch to ensure CI checks run with the latest code:

git fetch origin main
git merge origin/main
git push

Note: Enable "Allow edits by maintainers" to allow automatic updates.

- Apply end-of-file and whitespace fixes to workflows

github-actions · 2025-08-07T05:50:35Z

⚠️ Branch Update Required

Your branch is 8 commits behind main.

Please update your branch:

git fetch origin main
git merge origin/main
git push

Or use GitHub's "Update branch" button if available.

- Fix empty interval bug when newline falls at chunk boundary (issue google#71) - Add concise comment explaining the fix logic - Remove excessive/obvious comments from chunking tests - Improve test docstring to be more descriptive and professional

The exceptions.py file existed in both the root directory and langextract/ directory with identical content. This removes the duplicate from the root to avoid confusion and maintain proper package structure.

…le (google#97) Introduces a provider registry system enabling third-party providers to be dynamically registered and discovered through a plugin architecture. Users can now integrate custom LLM backends (Azure OpenAI, AWS Bedrock, custom inference servers) without modifying core LangExtract code. Fixes google#80, google#67, google#54, google#49, google#48, google#53 Key Changes: **Provider Registry** (`langextract/providers/registry.py`) - Pattern-based registration with priority resolution - Automatic discovery via Python entry points - Lazy loading for performance **Factory Enhancements** (`langextract/factory.py`) - `ModelConfig` dataclass for structured configuration - Explicit provider selection when patterns overlap - Full backward compatibility maintained **Plugin Example** (`examples/custom_provider_plugin/`) - Complete working example with entry point configuration - Shows how to create custom providers for any backend **Documentation** - Comprehensive provider system README with architecture diagrams - Step-by-step plugin creation guide **Dependencies** - Move openai to optional dependencies - Update tox.ini to include openai in test environments **Lint Fixes** - Add appropriate pylint suppressions for legitimate patterns - Fix unused variable warnings in tests - Address import and global statement warnings No anticipated breakage - full backward compatibility maintained. Given significant internal changes to provider loading, issues should be reported if unexpected behavior is encountered.

- Reuses all inference logic, only overrides client initialization - Supports azure: and aoai: model_id prefixes (e.g., azure:gpt-4o) - Accepts azure_endpoint, api_key, and api_version as parameters - Auto-registered with priority 20 via registry decorator - Add Azure OpenAI to provider documentation and examples - Update .gitignore for cleaner temp file handling Follows inheritance proposition by HarvinderBhullar in issue google#49: google#49

github-actions · 2025-08-08T15:07:50Z

❌ Infrastructure File Protection

This PR modifies protected infrastructure files:

.gitignore (3 changes)

Only repository maintainers are allowed to modify infrastructure files (including .github/, build configuration, and repository documentation).

Note: If these are only formatting changes, please:

Revert changes to .github/ files
Use ./autoformat.sh to format only source code directories
Avoid running formatters on infrastructure files

If structural changes are necessary:

Open an issue describing the needed infrastructure changes
A maintainer will review and implement the changes if approved

For more information, see our Contributing Guidelines.

github-actions · 2025-08-08T15:07:51Z

❌ Infrastructure File Protection

This PR modifies protected infrastructure files:

.gitignore (3 changes)

Only repository maintainers are allowed to modify infrastructure files (including .github/, build configuration, and repository documentation).

Note: If these are only formatting changes, please:

Revert changes to .github/ files
Use ./autoformat.sh to format only source code directories
Avoid running formatters on infrastructure files

If structural changes are necessary:

Open an issue describing the needed infrastructure changes
A maintainer will review and implement the changes if approved

For more information, see our Contributing Guidelines.

- Add import-error disable for optional openai dependency in azure_openai.py - Remove useless too-many-instance-attributes suppression in inference.py Fixes CI lint-src check failures

github-actions · 2025-08-08T15:13:25Z

❌ Infrastructure File Protection

This PR modifies protected infrastructure files:

.gitignore (3 changes)

Only repository maintainers are allowed to modify infrastructure files (including .github/, build configuration, and repository documentation).

Note: If these are only formatting changes, please:

Revert changes to .github/ files
Use ./autoformat.sh to format only source code directories
Avoid running formatters on infrastructure files

If structural changes are necessary:

Open an issue describing the needed infrastructure changes
A maintainer will review and implement the changes if approved

For more information, see our Contributing Guidelines.

github-actions · 2025-08-14T09:24:34Z

⚠️ Branch Update Required

Your branch is 20 commits behind main. Please update your branch to ensure CI checks run with the latest code:

git fetch origin main
git merge origin/main
git push

Note: Enable "Allow edits by maintainers" to allow automatic updates.

github-actions · 2025-08-22T02:32:09Z

⚠️ Branch Update Required

Your branch is 86 commits behind main. Please update your branch to ensure CI checks run with the latest code:

git fetch origin main
git merge origin/main
git push

Note: Enable "Allow edits by maintainers" to allow automatic updates.

github-actions · 2025-08-30T02:27:36Z

⚠️ Branch Update Required

Your branch is 98 commits behind main. Please update your branch to ensure CI checks run with the latest code:

git fetch origin main
git merge origin/main
git push

Note: Enable "Allow edits by maintainers" to allow automatic updates.

github-actions · 2025-09-07T02:30:42Z

⚠️ Branch Update Required

Your branch is 106 commits behind main. Please update your branch to ensure CI checks run with the latest code:

git fetch origin main
git merge origin/main
git push

Note: Enable "Allow edits by maintainers" to allow automatic updates.

github-actions · 2025-09-15T02:31:56Z

⚠️ Branch Update Required

Your branch is 107 commits behind main. Please update your branch to ensure CI checks run with the latest code:

git fetch origin main
git merge origin/main
git push

Note: Enable "Allow edits by maintainers" to allow automatic updates.

github-actions · 2025-09-22T02:32:06Z

⚠️ Branch Update Required

Your branch is 109 commits behind main. Please update your branch to ensure CI checks run with the latest code:

git fetch origin main
git merge origin/main
git push

Note: Enable "Allow edits by maintainers" to allow automatic updates.

github-actions · 2025-09-30T02:28:20Z

⚠️ Branch Update Required

Your branch is 110 commits behind main. Please update your branch to ensure CI checks run with the latest code:

git fetch origin main
git merge origin/main
git push

Note: Enable "Allow edits by maintainers" to allow automatic updates.

github-actions · 2025-10-08T02:28:14Z

⚠️ Branch Update Required

Your branch is 111 commits behind main. Please update your branch to ensure CI checks run with the latest code:

git fetch origin main
git merge origin/main
git push

Note: Enable "Allow edits by maintainers" to allow automatic updates.

aksg87 and others added 22 commits July 22, 2025 01:39

docs(pypi): Improve README display and badge reliability

2ce2399

- Switch from badge.fury.io to shields.io for working PyPI badge - Convert relative paths to absolute GitHub URLs for PyPI compatibility - Bump version to 0.1.3

Fix: Resolve libmagic ImportError (google#6)

e696a48

- Add pylibmagic>=0.5.0 dependency for bundled libraries - Add [full] install option and pre-import handling - Update README with troubleshooting and Docker sections - Bump version to 1.0.1 Fixes google#6

docs: clarify output_dir behavior in medication_examples.md

5447637

Merge pull request google#11 from google/fix/libmagic-dependency-issue

9c47b34

Removed inline comment in medication example

175e075

Deleted an inline comment referencing the output directory in the save_annotated_documents.

Merge pull request google#15 from kleeena/docs/update-medication_exam…

9472099

…ples.md docs: clarify output_dir behavior in medication_examples.md

docs: add output_dir="." to all save_annotated_documents examples

e6c3dcd

Prevents confusion from default `test_output/...` by explicitly saving to current directory.

Merge pull request google#17 from google/fix/output-dir-consistency

1fb1f1d

docs: add output_dir="." to all save_annotated_documents examples

build: add formatting & linting pipeline with pre-commit integration

13fbd2c

style: apply pyink, isort, and pre-commit formatting

c8d2027

ci: enable format and lint checks in tox

146a095

Merge pull request google#24 from google/feat/code-formatting-pipeline

aa6da18

feat: add code formatting and linting pipeline

Add LangExtractError base exception for centralized error handling

ed65bca

Introduces a common base exception class that all library-specific exceptions inherit from, enabling users to catch all LangExtract errors with a single except clause.

Merge pull request google#26 from google/feat/exception-hierarchy

6c4508b

Add LangExtractError base exception for centralized error handling

fix: Remove LangFun and pylibmagic dependencies (v1.0.2)

8b85225

Fixes google#25 - Windows installation failure due to pylibmagic build requirements Breaking change: LangFunLanguageModel removed. Use GeminiLanguageModel or OllamaLanguageModel instead.

Merge pull request google#28 from google/fix/remove-breaking-dep-langfun

88520cc

fix: Remove LangFun and pylibmagic dependencies to fix Windows installation and OpenAI SDK v1.x compatibility

Fix save_annotated_documents to handle string paths

75a6f12

- Modified save_annotated_documents to accept both pathlib.Path and string paths - Convert string paths to Path objects before calling mkdir() - This fixes the error when using output_dir='.' as shown in the README example

Merge pull request google#29 from google/fix-save-annotated-documents…

a415b94

…-mkdir Fix save_annotated_documents to handle string paths

feat: Add OpenAI language model support

8289b3a

Merge pull request google#31 from google/feature/add-oai-inference

c8ef723

feat: Add OpenAI language model support

aksg87 added 2 commits August 4, 2025 10:24

Add PR template validation workflow (google#45)

dc61372

aksg87 changed the title ~~feat: Add Azure OpenAI language model support and enhance README~~ feat: Add Azure OpenAI language model support and enhance README Aug 4, 2025

aksg87 and others added 4 commits August 5, 2025 04:19

fix: Change OllamaLanguageModel parameter from 'model' to 'model_id' (g…

da771e6

…oogle#57) Fixes google#27

feat: Add CITATION.cff file for proper software citation

e83d5cf

chore: Bump version to 1.0.4 for release

a7ef0bd

- Ollama integration with Docker examples - Fixed OllamaLanguageModel parameter name (model -> model_id) - Added CI/CD tests for Ollama - Updated documentation with consistent API examples

Add PR update automation workflows

1c3c1a2

Auto-updates PRs behind main, handles forks/conflicts gracefully, skips bot/draft PRs, monitors API limits

Fix workflow formatting

b60f0b2

- Apply end-of-file and whitespace fixes to workflows

aksg87 and others added 6 commits August 7, 2025 03:56

Bump version to 1.0.5

b3bff86

Remove duplicate exceptions.py from root directory (google#94)

f3c1553

The exceptions.py file existed in both the root directory and langextract/ directory with identical content. This removes the duplicate from the root to avoid confusion and maintain proper package structure.

Fix unicode escaping in example generation (google#98)

845258c

Update provider documentation

c8aa788

JustStas closed this Aug 8, 2025

JustStas force-pushed the azure-openai branch from b0b085f to c8aa788 Compare August 8, 2025 13:46

github-actions bot added the size/XS Pull request with less than 50 lines changed label Aug 8, 2025

JustStas reopened this Aug 8, 2025

github-actions bot added the size/S Pull request with 50-150 lines changed label Aug 8, 2025

fix: Resolve pylint issues for CI

6ae95c5

- Add import-error disable for optional openai dependency in azure_openai.py - Remove useless too-many-instance-attributes suppression in inference.py Fixes CI lint-src check failures

JustStas mentioned this pull request Aug 8, 2025

Plugin support for custom LLM providers now available #99

Open

aksg87 force-pushed the main branch from e36e455 to 3dff0d3 Compare August 21, 2025 01:43

feat: Add Azure OpenAI language model support and enhance README #38

Are you sure you want to change the base?

feat: Add Azure OpenAI language model support and enhance README #38

Uh oh!

Conversation

JustStas commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

How Has This Been Tested?

Checklist:

Uh oh!

google-cla bot commented Aug 4, 2025

Uh oh!

github-actions bot commented Aug 7, 2025

Uh oh!

github-actions bot commented Aug 7, 2025

Uh oh!

github-actions bot commented Aug 8, 2025

Uh oh!

github-actions bot commented Aug 8, 2025

Uh oh!

github-actions bot commented Aug 8, 2025

Uh oh!

github-actions bot commented Aug 14, 2025

Uh oh!

github-actions bot commented Aug 22, 2025

Uh oh!

github-actions bot commented Aug 30, 2025

Uh oh!

github-actions bot commented Sep 7, 2025

Uh oh!

github-actions bot commented Sep 15, 2025

Uh oh!

github-actions bot commented Sep 22, 2025

Uh oh!

github-actions bot commented Sep 30, 2025

Uh oh!

github-actions bot commented Oct 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

JustStas commented Aug 4, 2025 •

edited

Loading