Skip to content

Conversation

JustStas
Copy link
Contributor

@JustStas JustStas commented Aug 4, 2025

Description

Follows @HarvinderBhullar proposition in #49

  • Implement AzureOpenAILanguageModel inheriting from OpenAILanguageModel
  • Reuses all inference logic, only overrides client initialization
  • Supports azure: and aoai: model_id prefixes (e.g., azure:gpt-4o)
  • Accepts azure_endpoint, api_key, and api_version as parameters
  • Auto-registered with priority 20 via registry decorator
  • Add Azure OpenAI to provider documentation and examples
  • Update .gitignore for cleaner temp file handling

(Feature)
Fixes #49

How Has This Been Tested?

import os
import textwrap

import langextract as lx


def main() -> None:
  prompt = textwrap.dedent("""
      Extract main characters and brief emotions from the text.
      Use exact spans from the text for extraction_text.
      """)

  examples = [
      lx.data.ExampleData(
          text=(
              "ROMEO. But soft! What light through yonder window breaks? "
              "It is the east, and Juliet is the sun."
          ),
          extractions=[
              lx.data.Extraction(
                  extraction_class="character", extraction_text="ROMEO"
              ),
              lx.data.Extraction(
                  extraction_class="emotion",
                  extraction_text="But soft!",
                  attributes={"feeling": "awe"},
              ),
          ],
      )
  ]

  input_text = (
      "Lady Juliet gazed longingly at the stars, her heart aching for Romeo"
  )

  # Built-in provider approach: Use lx.extract with azure: prefix and language_model_params
  result = lx.extract(
      text_or_documents=input_text,
      prompt_description=prompt,
      examples=examples,
      model_id="azure:gpt-5",  # Prefix selects Azure provider, deployment=gpt-4o
      fence_output=True,  # Required for Azure OpenAI (like OpenAI)
      use_schema_constraints=False,  # Required for Azure OpenAI (like OpenAI)
      language_model_params={
          "azure_endpoint": os.environ["AZURE_OPENAI_ENDPOINT"],
          "api_key": os.environ["AZURE_OPENAI_API_KEY"],
          "api_version": "2024-07-01-preview",
      },
  )

  print(result)

Checklist:

  • I have read and acknowledged Google's Open Source
    Code of conduct.
  • I have read the
    Contributing
    page, and I either signed the Google
    Individual CLA
    or am covered by my company's
    Corporate CLA.
  • I have discussed my proposed solution with code owners in the linked
    issue(s) and we have agreed upon the general approach.
  • I have made any needed documentation changes, or noted in the linked
    issue(s) that documentation elsewhere needs updating.
  • I have added tests, or I have ensured existing tests cover the changes
  • I have followed
    Google's Python Style Guide
    and ran pylint over the affected code.

aksg87 and others added 22 commits July 22, 2025 01:39
- Switch from badge.fury.io to shields.io for working PyPI badge
- Convert relative paths to absolute GitHub URLs for PyPI compatibility
- Bump version to 0.1.3
- Add GitHub Actions workflow for automated PyPI publishing via OIDC
- Configure trusted publishing environment for verified releases
- Update project metadata with proper URLs and license format
- Prepare for v1.0.0 stable release with production-ready automation
- Add pylibmagic>=0.5.0 dependency for bundled libraries
- Add [full] install option and pre-import handling
- Update README with troubleshooting and Docker sections
- Bump version to 1.0.1

Fixes google#6
Deleted an inline comment referencing the  output directory in the save_annotated_documents.
…ples.md

docs: clarify output_dir behavior in medication_examples.md
Prevents confusion from default `test_output/...` by explicitly saving to current directory.
docs: add output_dir="." to all save_annotated_documents examples
feat: add code formatting and linting pipeline
Introduces a common base exception class that all library-specific exceptions inherit from, enabling users to catch all LangExtract errors with a single except clause.
Add LangExtractError base exception for centralized error handling
Fixes google#25 - Windows installation failure due to pylibmagic build requirements

Breaking change: LangFunLanguageModel removed. Use GeminiLanguageModel or OllamaLanguageModel instead.
fix: Remove LangFun and pylibmagic dependencies to fix Windows installation and OpenAI SDK v1.x compatibility
- Modified save_annotated_documents to accept both pathlib.Path and string paths
- Convert string paths to Path objects before calling mkdir()
- This fixes the error when using output_dir='.' as shown in the README example
…-mkdir

Fix save_annotated_documents to handle string paths
feat: Add OpenAI language model support
…s: (google#10)

* docs: clarify output_dir behavior in medication_examples.md

* Removed inline comment in medication example

Deleted an inline comment referencing the  output directory in the save_annotated_documents.

* docs: add output_dir="." to all save_annotated_documents examples

Prevents confusion from default `test_output/...` by explicitly saving to current directory.

* build: add formatting & linting pipeline with pre-commit integration

* style: apply pyink, isort, and pre-commit formatting

* ci: enable format and lint checks in tox

* Add LangExtractError base exception for centralized error handling

Introduces a common base exception class that all library-specific exceptions inherit from, enabling users to catch all LangExtract errors with a single except clause.

* fix(ui): prevent current highlight border from being obscured

---------

Co-authored-by: Leena Kamran <[email protected]>
Co-authored-by: Akshay Goel <[email protected]>
Copy link

google-cla bot commented Aug 4, 2025

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

aksg87 added 2 commits August 4, 2025 10:24
- Gemini & OpenAI test suites with retry on transient errors
- CI: Separate job, Python 3.11 only, skips for forks
- Validates char_interval for all extractions
- Multilingual test xfail (issue google#13)

TODO: Remove xfail from multilingual test after tokenizer fix
@aksg87 aksg87 changed the title feat: Add Azure OpenAI language model support and enhance README feat: Add Azure OpenAI language model support and enhance README Aug 4, 2025
aksg87 and others added 4 commits August 5, 2025 04:19
…e#62)

- Add quickstart example and documentation for local LLM usage
- Include Docker setup with health checks and docker-compose
- Add integration tests and update CI pipeline
- Secure setup: localhost-only binding, containerized deployment

Signed-off-by: Akshay Goel <[email protected]>
- Ollama integration with Docker examples
- Fixed OllamaLanguageModel parameter name (model -> model_id)
- Added CI/CD tests for Ollama
- Updated documentation with consistent API examples
Auto-updates PRs behind main, handles forks/conflicts gracefully,
skips bot/draft PRs, monitors API limits
Copy link

github-actions bot commented Aug 7, 2025

⚠️ Branch Update Required

Your branch is 7 commits behind main. Please update your branch to ensure CI checks run with the latest code:

git fetch origin main
git merge origin/main
git push

Note: Enable "Allow edits by maintainers" to allow automatic updates.

- Apply end-of-file and whitespace fixes to workflows
Copy link

github-actions bot commented Aug 7, 2025

⚠️ Branch Update Required

Your branch is 8 commits behind main.

Please update your branch:

git fetch origin main
git merge origin/main
git push

Or use GitHub's "Update branch" button if available.

aksg87 and others added 6 commits August 7, 2025 03:56
- Fix empty interval bug when newline falls at chunk boundary (issue google#71)
- Add concise comment explaining the fix logic
- Remove excessive/obvious comments from chunking tests
- Improve test docstring to be more descriptive and professional
The exceptions.py file existed in both the root directory and langextract/ directory with identical content. This removes the duplicate from the root to avoid confusion and maintain proper package structure.
…le (google#97)

Introduces a provider registry system enabling third-party providers to be dynamically registered and discovered through a plugin architecture. Users can now integrate custom LLM backends (Azure OpenAI, AWS Bedrock, custom inference servers) without modifying core LangExtract code.

Fixes google#80, google#67, google#54, google#49, google#48, google#53

Key Changes:

**Provider Registry** (`langextract/providers/registry.py`)
- Pattern-based registration with priority resolution
- Automatic discovery via Python entry points
- Lazy loading for performance

**Factory Enhancements** (`langextract/factory.py`)
- `ModelConfig` dataclass for structured configuration
- Explicit provider selection when patterns overlap
- Full backward compatibility maintained

**Plugin Example** (`examples/custom_provider_plugin/`)
- Complete working example with entry point configuration
- Shows how to create custom providers for any backend

**Documentation**
- Comprehensive provider system README with architecture diagrams
- Step-by-step plugin creation guide

**Dependencies**
- Move openai to optional dependencies
- Update tox.ini to include openai in test environments

**Lint Fixes**
- Add appropriate pylint suppressions for legitimate patterns
- Fix unused variable warnings in tests
- Address import and global statement warnings

No anticipated breakage - full backward compatibility maintained. Given significant internal changes to provider loading, issues should be reported if unexpected behavior is encountered.
@JustStas JustStas closed this Aug 8, 2025
@github-actions github-actions bot added the size/XS Pull request with less than 50 lines changed label Aug 8, 2025
- Reuses all inference logic, only overrides client initialization
- Supports azure: and aoai: model_id prefixes (e.g., azure:gpt-4o)
- Accepts azure_endpoint, api_key, and api_version as parameters
- Auto-registered with priority 20 via registry decorator
- Add Azure OpenAI to provider documentation and examples
- Update .gitignore for cleaner temp file handling

Follows inheritance proposition by HarvinderBhullar in issue google#49:
google#49
@JustStas JustStas reopened this Aug 8, 2025
Copy link

github-actions bot commented Aug 8, 2025

Infrastructure File Protection

This PR modifies protected infrastructure files:

  • .gitignore (3 changes)

Only repository maintainers are allowed to modify infrastructure files (including .github/, build configuration, and repository documentation).

Note: If these are only formatting changes, please:

  1. Revert changes to .github/ files
  2. Use ./autoformat.sh to format only source code directories
  3. Avoid running formatters on infrastructure files

If structural changes are necessary:

  1. Open an issue describing the needed infrastructure changes
  2. A maintainer will review and implement the changes if approved

For more information, see our Contributing Guidelines.

@github-actions github-actions bot added the size/S Pull request with 50-150 lines changed label Aug 8, 2025
Copy link

github-actions bot commented Aug 8, 2025

Infrastructure File Protection

This PR modifies protected infrastructure files:

  • .gitignore (3 changes)

Only repository maintainers are allowed to modify infrastructure files (including .github/, build configuration, and repository documentation).

Note: If these are only formatting changes, please:

  1. Revert changes to .github/ files
  2. Use ./autoformat.sh to format only source code directories
  3. Avoid running formatters on infrastructure files

If structural changes are necessary:

  1. Open an issue describing the needed infrastructure changes
  2. A maintainer will review and implement the changes if approved

For more information, see our Contributing Guidelines.

- Add import-error disable for optional openai dependency in azure_openai.py
- Remove useless too-many-instance-attributes suppression in inference.py

Fixes CI lint-src check failures
Copy link

github-actions bot commented Aug 8, 2025

Infrastructure File Protection

This PR modifies protected infrastructure files:

  • .gitignore (3 changes)

Only repository maintainers are allowed to modify infrastructure files (including .github/, build configuration, and repository documentation).

Note: If these are only formatting changes, please:

  1. Revert changes to .github/ files
  2. Use ./autoformat.sh to format only source code directories
  3. Avoid running formatters on infrastructure files

If structural changes are necessary:

  1. Open an issue describing the needed infrastructure changes
  2. A maintainer will review and implement the changes if approved

For more information, see our Contributing Guidelines.

Copy link

⚠️ Branch Update Required

Your branch is 20 commits behind main. Please update your branch to ensure CI checks run with the latest code:

git fetch origin main
git merge origin/main
git push

Note: Enable "Allow edits by maintainers" to allow automatic updates.

Copy link

⚠️ Branch Update Required

Your branch is 86 commits behind main. Please update your branch to ensure CI checks run with the latest code:

git fetch origin main
git merge origin/main
git push

Note: Enable "Allow edits by maintainers" to allow automatic updates.

Copy link

⚠️ Branch Update Required

Your branch is 98 commits behind main. Please update your branch to ensure CI checks run with the latest code:

git fetch origin main
git merge origin/main
git push

Note: Enable "Allow edits by maintainers" to allow automatic updates.

Copy link

github-actions bot commented Sep 7, 2025

⚠️ Branch Update Required

Your branch is 106 commits behind main. Please update your branch to ensure CI checks run with the latest code:

git fetch origin main
git merge origin/main
git push

Note: Enable "Allow edits by maintainers" to allow automatic updates.

Copy link

⚠️ Branch Update Required

Your branch is 107 commits behind main. Please update your branch to ensure CI checks run with the latest code:

git fetch origin main
git merge origin/main
git push

Note: Enable "Allow edits by maintainers" to allow automatic updates.

Copy link

⚠️ Branch Update Required

Your branch is 109 commits behind main. Please update your branch to ensure CI checks run with the latest code:

git fetch origin main
git merge origin/main
git push

Note: Enable "Allow edits by maintainers" to allow automatic updates.

Copy link

⚠️ Branch Update Required

Your branch is 110 commits behind main. Please update your branch to ensure CI checks run with the latest code:

git fetch origin main
git merge origin/main
git push

Note: Enable "Allow edits by maintainers" to allow automatic updates.

Copy link

github-actions bot commented Oct 8, 2025

⚠️ Branch Update Required

Your branch is 111 commits behind main. Please update your branch to ensure CI checks run with the latest code:

git fetch origin main
git merge origin/main
git push

Note: Enable "Allow edits by maintainers" to allow automatic updates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/M Pull request with 150-600 lines changed size/S Pull request with 50-150 lines changed size/XS Pull request with less than 50 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

No AzureOpenAI integration

6 participants