-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Fix security vulnerability and bugs in Ollama API integration #18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix security vulnerability and bugs in Ollama API integration #18
Conversation
- Switch from badge.fury.io to shields.io for working PyPI badge - Convert relative paths to absolute GitHub URLs for PyPI compatibility - Bump version to 0.1.3
- Add GitHub Actions workflow for automated PyPI publishing via OIDC - Configure trusted publishing environment for verified releases - Update project metadata with proper URLs and license format - Prepare for v1.0.0 stable release with production-ready automation
- Add pylibmagic>=0.5.0 dependency for bundled libraries - Add [full] install option and pre-import handling - Update README with troubleshooting and Docker sections - Bump version to 1.0.1 Fixes google#6
Deleted an inline comment referencing the output directory in the save_annotated_documents.
…ples.md docs: clarify output_dir behavior in medication_examples.md
Prevents confusion from default `test_output/...` by explicitly saving to current directory.
docs: add output_dir="." to all save_annotated_documents examples
The _ollama_query method was using 'num_thread' (singular) instead of 'num_threads' (plural) when setting the option in the API payload. This prevented the thread count from being properly passed to the Ollama server. This fix ensures consistency with: - The parameter name used throughout the codebase - The Ollama API documentation which expects 'num_threads' - Other occurrences in the same file (lines 202, 230, 283) Bug: The num_threads parameter was not being respected by Ollama due to the incorrect key name in the options dictionary.
This commit addresses multiple issues in the _ollama_query method: 1. Security: Fix potential SSRF vulnerability by properly validating URLs - Added URL scheme validation (only http/https allowed) - Added hostname validation to prevent empty URLs - Use urllib.parse.urljoin() for safe URL construction 2. Bug fixes: - Fixed typo: 'num_thread' -> 'num_threads' in options dictionary - Fixed using wrong variable: used constructed api_endpoint instead of model_url in requests.post() - Prevents double slashes in URLs when model_url ends with '/' These changes ensure: - Protection against Server-Side Request Forgery attacks - Correct parameter passing to Ollama API - Proper URL construction regardless of input format
feat: add code formatting and linting pipeline
Introduces a common base exception class that all library-specific exceptions inherit from, enabling users to catch all LangExtract errors with a single except clause.
Add LangExtractError base exception for centralized error handling
Fixes google#25 - Windows installation failure due to pylibmagic build requirements Breaking change: LangFunLanguageModel removed. Use GeminiLanguageModel or OllamaLanguageModel instead.
fix: Remove LangFun and pylibmagic dependencies to fix Windows installation and OpenAI SDK v1.x compatibility
- Modified save_annotated_documents to accept both pathlib.Path and string paths - Convert string paths to Path objects before calling mkdir() - This fixes the error when using output_dir='.' as shown in the README example
…-mkdir Fix save_annotated_documents to handle string paths
feat: Add OpenAI language model support
…s: (google#10) * docs: clarify output_dir behavior in medication_examples.md * Removed inline comment in medication example Deleted an inline comment referencing the output directory in the save_annotated_documents. * docs: add output_dir="." to all save_annotated_documents examples Prevents confusion from default `test_output/...` by explicitly saving to current directory. * build: add formatting & linting pipeline with pre-commit integration * style: apply pyink, isort, and pre-commit formatting * ci: enable format and lint checks in tox * Add LangExtractError base exception for centralized error handling Introduces a common base exception class that all library-specific exceptions inherit from, enabling users to catch all LangExtract errors with a single except clause. * fix(ui): prevent current highlight border from being obscured --------- Co-authored-by: Leena Kamran <[email protected]> Co-authored-by: Akshay Goel <[email protected]>
- Gemini & OpenAI test suites with retry on transient errors - CI: Separate job, Python 3.11 only, skips for forks - Validates char_interval for all extractions - Multilingual test xfail (issue google#13) TODO: Remove xfail from multilingual test after tokenizer fix
* Add base_url to OpenAILanguageModel * Github action lint is outdated, so adapting * Adding base_url to parameterized test * Lint fixes to inference_test.py
Bug: Workflows triggered on pull_request_target but checked for pull_request, causing all validations to be skipped. Fixed: - Event condition checks now match trigger type - Add manual revalidation workflow - Enable workflow_dispatch with PR number input
Manual validation results: Size: 17 lines Run ID: 16790883101 |
- Creates visible PR checks (pass/fail status) - Shows validation errors in status description (up to 140 chars) - Links to workflow run for full details - Maintains backward compatibility with comment reporting
Manual validation results: Size: 17 lines Run ID: 16791204550 |
Manual Validation ResultsStatus: ❌ Failed
Errors:
|
The workflow was comparing boolean true to string 'true', causing all validations to incorrectly show as failed even when all checks passed.
- revalidate-all-prs.sh: Triggers manual validation for all open PRs - add-size-labels.sh: Adds size labels (XS/S/M/L/XL) based on change count - add-new-checks.sh: Adds required status checks to branch protection These scripts require maintainer permissions and help manage PR workflows.
- Add type ignore comments for IPython imports - Fix return type annotation (remove unnecessary quotes) - Add _is_jupyter() to properly detect notebook environments - Replace lambda with def function for pylint compliance Fixes google#65
- Add format-check job that checks actual PR code, not merge commit - Validate formatting before expensive fork PR tests - Provide clear error messages when formatting fails Fixes false positives where incorrectly formatted PRs passed CI
Auto-updates PRs behind main, handles forks/conflicts gracefully, skips bot/draft PRs, monitors API limits
Your branch is 21 commits behind git fetch origin main
git merge origin/main
git push Note: Enable "Allow edits by maintainers" to allow automatic updates. |
- Apply end-of-file and whitespace fixes to workflows
- Fix empty interval bug when newline falls at chunk boundary (issue google#71) - Add concise comment explaining the fix logic - Remove excessive/obvious comments from chunking tests - Improve test docstring to be more descriptive and professional
No linked issues found. Please add the corresponding issues in the pull request description. |
Your branch is 25 commits behind git fetch origin main
git merge origin/main
git push Note: Enable "Allow edits by maintainers" to allow automatic updates. |
Your branch is 86 commits behind git fetch origin main
git merge origin/main
git push Note: Enable "Allow edits by maintainers" to allow automatic updates. |
Your branch is 98 commits behind git fetch origin main
git merge origin/main
git push Note: Enable "Allow edits by maintainers" to allow automatic updates. |
Your branch is 106 commits behind git fetch origin main
git merge origin/main
git push Note: Enable "Allow edits by maintainers" to allow automatic updates. |
Your branch is 107 commits behind git fetch origin main
git merge origin/main
git push Note: Enable "Allow edits by maintainers" to allow automatic updates. |
Your branch is 109 commits behind git fetch origin main
git merge origin/main
git push Note: Enable "Allow edits by maintainers" to allow automatic updates. |
Your branch is 110 commits behind git fetch origin main
git merge origin/main
git push Note: Enable "Allow edits by maintainers" to allow automatic updates. |
Your branch is 111 commits behind git fetch origin main
git merge origin/main
git push Note: Enable "Allow edits by maintainers" to allow automatic updates. |
Description
This PR addresses critical security vulnerabilities and functional bugs in the Ollama language model integration, specifically in the
_ollama_query
method withinlangextract/inference.py
.Issues Fixed
SSRF Vulnerability (Server-Side Request Forgery)
http
andhttps
schemes are allowed.urllib.parse.urljoin()
for safe URL construction.file://
,ftp://
, etc.Typo in Parameter Name
'num_thread'
used instead of'num_threads'
.'num_threads'
to match the Ollama API specification.Incorrect Variable in API Call
api_endpoint
, the code erroneously usedmodel_url
in the API request.api_endpoint
.Double-Slash in URLs
//api/generate
).urljoin()
which handles trailing slashes safely.Issue: #89
Changes Summary
Example Attack Prevention
Before (Vulnerable):