-
Notifications
You must be signed in to change notification settings - Fork 1.1k
feat: Add Azure OpenAI language model support and enhance README #38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Switch from badge.fury.io to shields.io for working PyPI badge - Convert relative paths to absolute GitHub URLs for PyPI compatibility - Bump version to 0.1.3
- Add GitHub Actions workflow for automated PyPI publishing via OIDC - Configure trusted publishing environment for verified releases - Update project metadata with proper URLs and license format - Prepare for v1.0.0 stable release with production-ready automation
- Add pylibmagic>=0.5.0 dependency for bundled libraries - Add [full] install option and pre-import handling - Update README with troubleshooting and Docker sections - Bump version to 1.0.1 Fixes google#6
Deleted an inline comment referencing the output directory in the save_annotated_documents.
…ples.md docs: clarify output_dir behavior in medication_examples.md
Prevents confusion from default `test_output/...` by explicitly saving to current directory.
docs: add output_dir="." to all save_annotated_documents examples
feat: add code formatting and linting pipeline
Introduces a common base exception class that all library-specific exceptions inherit from, enabling users to catch all LangExtract errors with a single except clause.
Add LangExtractError base exception for centralized error handling
Fixes google#25 - Windows installation failure due to pylibmagic build requirements Breaking change: LangFunLanguageModel removed. Use GeminiLanguageModel or OllamaLanguageModel instead.
fix: Remove LangFun and pylibmagic dependencies to fix Windows installation and OpenAI SDK v1.x compatibility
- Modified save_annotated_documents to accept both pathlib.Path and string paths - Convert string paths to Path objects before calling mkdir() - This fixes the error when using output_dir='.' as shown in the README example
…-mkdir Fix save_annotated_documents to handle string paths
feat: Add OpenAI language model support
…s: (google#10) * docs: clarify output_dir behavior in medication_examples.md * Removed inline comment in medication example Deleted an inline comment referencing the output directory in the save_annotated_documents. * docs: add output_dir="." to all save_annotated_documents examples Prevents confusion from default `test_output/...` by explicitly saving to current directory. * build: add formatting & linting pipeline with pre-commit integration * style: apply pyink, isort, and pre-commit formatting * ci: enable format and lint checks in tox * Add LangExtractError base exception for centralized error handling Introduces a common base exception class that all library-specific exceptions inherit from, enabling users to catch all LangExtract errors with a single except clause. * fix(ui): prevent current highlight border from being obscured --------- Co-authored-by: Leena Kamran <[email protected]> Co-authored-by: Akshay Goel <[email protected]>
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
- Gemini & OpenAI test suites with retry on transient errors - CI: Separate job, Python 3.11 only, skips for forks - Validates char_interval for all extractions - Multilingual test xfail (issue google#13) TODO: Remove xfail from multilingual test after tokenizer fix
…e#62) - Add quickstart example and documentation for local LLM usage - Include Docker setup with health checks and docker-compose - Add integration tests and update CI pipeline - Secure setup: localhost-only binding, containerized deployment Signed-off-by: Akshay Goel <[email protected]>
- Ollama integration with Docker examples - Fixed OllamaLanguageModel parameter name (model -> model_id) - Added CI/CD tests for Ollama - Updated documentation with consistent API examples
Auto-updates PRs behind main, handles forks/conflicts gracefully, skips bot/draft PRs, monitors API limits
Your branch is 7 commits behind git fetch origin main
git merge origin/main
git push Note: Enable "Allow edits by maintainers" to allow automatic updates. |
- Apply end-of-file and whitespace fixes to workflows
Your branch is 8 commits behind Please update your branch: git fetch origin main
git merge origin/main
git push Or use GitHub's "Update branch" button if available. |
- Fix empty interval bug when newline falls at chunk boundary (issue google#71) - Add concise comment explaining the fix logic - Remove excessive/obvious comments from chunking tests - Improve test docstring to be more descriptive and professional
The exceptions.py file existed in both the root directory and langextract/ directory with identical content. This removes the duplicate from the root to avoid confusion and maintain proper package structure.
…le (google#97) Introduces a provider registry system enabling third-party providers to be dynamically registered and discovered through a plugin architecture. Users can now integrate custom LLM backends (Azure OpenAI, AWS Bedrock, custom inference servers) without modifying core LangExtract code. Fixes google#80, google#67, google#54, google#49, google#48, google#53 Key Changes: **Provider Registry** (`langextract/providers/registry.py`) - Pattern-based registration with priority resolution - Automatic discovery via Python entry points - Lazy loading for performance **Factory Enhancements** (`langextract/factory.py`) - `ModelConfig` dataclass for structured configuration - Explicit provider selection when patterns overlap - Full backward compatibility maintained **Plugin Example** (`examples/custom_provider_plugin/`) - Complete working example with entry point configuration - Shows how to create custom providers for any backend **Documentation** - Comprehensive provider system README with architecture diagrams - Step-by-step plugin creation guide **Dependencies** - Move openai to optional dependencies - Update tox.ini to include openai in test environments **Lint Fixes** - Add appropriate pylint suppressions for legitimate patterns - Fix unused variable warnings in tests - Address import and global statement warnings No anticipated breakage - full backward compatibility maintained. Given significant internal changes to provider loading, issues should be reported if unexpected behavior is encountered.
- Reuses all inference logic, only overrides client initialization - Supports azure: and aoai: model_id prefixes (e.g., azure:gpt-4o) - Accepts azure_endpoint, api_key, and api_version as parameters - Auto-registered with priority 20 via registry decorator - Add Azure OpenAI to provider documentation and examples - Update .gitignore for cleaner temp file handling Follows inheritance proposition by HarvinderBhullar in issue google#49: google#49
❌ Infrastructure File Protection This PR modifies protected infrastructure files:
Only repository maintainers are allowed to modify infrastructure files (including Note: If these are only formatting changes, please:
If structural changes are necessary:
For more information, see our Contributing Guidelines. |
❌ Infrastructure File Protection This PR modifies protected infrastructure files:
Only repository maintainers are allowed to modify infrastructure files (including Note: If these are only formatting changes, please:
If structural changes are necessary:
For more information, see our Contributing Guidelines. |
- Add import-error disable for optional openai dependency in azure_openai.py - Remove useless too-many-instance-attributes suppression in inference.py Fixes CI lint-src check failures
❌ Infrastructure File Protection This PR modifies protected infrastructure files:
Only repository maintainers are allowed to modify infrastructure files (including Note: If these are only formatting changes, please:
If structural changes are necessary:
For more information, see our Contributing Guidelines. |
Your branch is 20 commits behind git fetch origin main
git merge origin/main
git push Note: Enable "Allow edits by maintainers" to allow automatic updates. |
Your branch is 86 commits behind git fetch origin main
git merge origin/main
git push Note: Enable "Allow edits by maintainers" to allow automatic updates. |
Your branch is 98 commits behind git fetch origin main
git merge origin/main
git push Note: Enable "Allow edits by maintainers" to allow automatic updates. |
Your branch is 106 commits behind git fetch origin main
git merge origin/main
git push Note: Enable "Allow edits by maintainers" to allow automatic updates. |
Your branch is 107 commits behind git fetch origin main
git merge origin/main
git push Note: Enable "Allow edits by maintainers" to allow automatic updates. |
Your branch is 109 commits behind git fetch origin main
git merge origin/main
git push Note: Enable "Allow edits by maintainers" to allow automatic updates. |
Your branch is 110 commits behind git fetch origin main
git merge origin/main
git push Note: Enable "Allow edits by maintainers" to allow automatic updates. |
Your branch is 111 commits behind git fetch origin main
git merge origin/main
git push Note: Enable "Allow edits by maintainers" to allow automatic updates. |
Description
Follows @HarvinderBhullar proposition in #49
(Feature)
Fixes #49
How Has This Been Tested?
Checklist:
Code of conduct.
Contributing
page, and I either signed the Google
Individual CLA
or am covered by my company's
Corporate CLA.
issue(s) and we have agreed upon the general approach.
issue(s) that documentation elsewhere needs updating.
Google's Python Style Guide
and ran
pylint
over the affected code.