Fix security vulnerability and bugs in Ollama API integration #18

Mirza-Samad-Ahmed-Baig · 2025-08-02T14:47:53Z

Description

This PR addresses critical security vulnerabilities and functional bugs in the Ollama language model integration, specifically in the _ollama_query method within langextract/inference.py.

Issues Fixed

SSRF Vulnerability (Server-Side Request Forgery)

Problem: Unsafe URL construction using string concatenation allowed for potential SSRF attacks.
Fix:
- Only http and https schemes are allowed.
- Ensures hostname is present.
- Uses urllib.parse.urljoin() for safe URL construction.
- Disallows unsafe schemes like file://, ftp://, etc.

Typo in Parameter Name

Problem: Incorrect parameter 'num_thread' used instead of 'num_threads'.
Fix: Renamed to 'num_threads' to match the Ollama API specification.
Impact: Thread count was ignored during inference calls.

Incorrect Variable in API Call

Problem: After constructing api_endpoint, the code erroneously used model_url in the API request.
Fix: Corrected the call to use api_endpoint.
Impact: API requests were made to an incorrect URL path.

Double-Slash in URLs

Problem: String concatenation could result in malformed URLs (e.g., //api/generate).
Fix: Replaced with urljoin() which handles trailing slashes safely.

Issue: #89

Changes Summary

Improved security by preventing SSRF.
Corrected parameter usage for thread control.
Corrected URL routing to API endpoints.
Safer and cleaner URL construction using standard library.

Example Attack Prevention

Before (Vulnerable):

model_url = "file:///etc/passwd"
model_url = "http://internal-service.local"
model_url = "ftp://attacker.com"


## How Has This Been Tested?

- [x] Manual Testing: Verified the fix using various URL formats (with/without trailing slashes), invalid URL schemes, and malicious URL patterns.
- [x] Functional Verification: Confirmed that the `num_threads` parameter is passed correctly to the Ollama API.
- [x] Error Handling: Confirmed proper exceptions are raised for invalid or unsafe URLs.
- [x] Automated Testing: All existing unit tests were run to ensure no regressions.
- [x] Test Environment: Local development environment on Windows.
- [x] Test Command: `python -m tox`

## Checklist

- [x] My code follows the style guidelines of this project  
- [x] I have performed a self-review of my code  
- [x] I have commented the code, especially in hard-to-understand areas  
- [ ] I have updated the documentation if necessary  
- [x] My changes do not introduce new warnings  
- [x] I have added or updated tests that verify my changes  
- [x] All tests pass locally  
- [ ] Any dependent changes have been merged and published

- Switch from badge.fury.io to shields.io for working PyPI badge - Convert relative paths to absolute GitHub URLs for PyPI compatibility - Bump version to 0.1.3

- Add GitHub Actions workflow for automated PyPI publishing via OIDC - Configure trusted publishing environment for verified releases - Update project metadata with proper URLs and license format - Prepare for v1.0.0 stable release with production-ready automation

- Add pylibmagic>=0.5.0 dependency for bundled libraries - Add [full] install option and pre-import handling - Update README with troubleshooting and Docker sections - Bump version to 1.0.1 Fixes google#6

Deleted an inline comment referencing the output directory in the save_annotated_documents.

…ples.md docs: clarify output_dir behavior in medication_examples.md

Prevents confusion from default `test_output/...` by explicitly saving to current directory.

docs: add output_dir="." to all save_annotated_documents examples

The _ollama_query method was using 'num_thread' (singular) instead of 'num_threads' (plural) when setting the option in the API payload. This prevented the thread count from being properly passed to the Ollama server. This fix ensures consistency with: - The parameter name used throughout the codebase - The Ollama API documentation which expects 'num_threads' - Other occurrences in the same file (lines 202, 230, 283) Bug: The num_threads parameter was not being respected by Ollama due to the incorrect key name in the options dictionary.

This commit addresses multiple issues in the _ollama_query method: 1. Security: Fix potential SSRF vulnerability by properly validating URLs - Added URL scheme validation (only http/https allowed) - Added hostname validation to prevent empty URLs - Use urllib.parse.urljoin() for safe URL construction 2. Bug fixes: - Fixed typo: 'num_thread' -> 'num_threads' in options dictionary - Fixed using wrong variable: used constructed api_endpoint instead of model_url in requests.post() - Prevents double slashes in URLs when model_url ends with '/' These changes ensure: - Protection against Server-Side Request Forgery attacks - Correct parameter passing to Ollama API - Proper URL construction regardless of input format

feat: add code formatting and linting pipeline

Introduces a common base exception class that all library-specific exceptions inherit from, enabling users to catch all LangExtract errors with a single except clause.

Add LangExtractError base exception for centralized error handling

Fixes google#25 - Windows installation failure due to pylibmagic build requirements Breaking change: LangFunLanguageModel removed. Use GeminiLanguageModel or OllamaLanguageModel instead.

fix: Remove LangFun and pylibmagic dependencies to fix Windows installation and OpenAI SDK v1.x compatibility

- Modified save_annotated_documents to accept both pathlib.Path and string paths - Convert string paths to Path objects before calling mkdir() - This fixes the error when using output_dir='.' as shown in the README example

…-mkdir Fix save_annotated_documents to handle string paths

feat: Add OpenAI language model support

…s: (google#10) * docs: clarify output_dir behavior in medication_examples.md * Removed inline comment in medication example Deleted an inline comment referencing the output directory in the save_annotated_documents. * docs: add output_dir="." to all save_annotated_documents examples Prevents confusion from default `test_output/...` by explicitly saving to current directory. * build: add formatting & linting pipeline with pre-commit integration * style: apply pyink, isort, and pre-commit formatting * ci: enable format and lint checks in tox * Add LangExtractError base exception for centralized error handling Introduces a common base exception class that all library-specific exceptions inherit from, enabling users to catch all LangExtract errors with a single except clause. * fix(ui): prevent current highlight border from being obscured --------- Co-authored-by: Leena Kamran <[email protected]> Co-authored-by: Akshay Goel <[email protected]>

- Gemini & OpenAI test suites with retry on transient errors - CI: Separate job, Python 3.11 only, skips for forks - Validates char_interval for all extractions - Multilingual test xfail (issue google#13) TODO: Remove xfail from multilingual test after tokenizer fix

…oogle#57) Fixes google#27

* Add base_url to OpenAILanguageModel * Github action lint is outdated, so adapting * Adding base_url to parameterized test * Lint fixes to inference_test.py

Bug: Workflows triggered on pull_request_target but checked for pull_request, causing all validations to be skipped. Fixed: - Event condition checks now match trigger type - Add manual revalidation workflow - Enable workflow_dispatch with PR number input

github-actions · 2025-08-06T23:16:27Z

Manual validation results:

Size: 17 lines
Template: ✗
Linked issue: ✗

Run ID: 16790883101

- Creates visible PR checks (pass/fail status) - Shows validation errors in status description (up to 140 chars) - Links to workflow run for full details - Maintains backward compatibility with comment reporting

github-actions · 2025-08-06T23:37:26Z

Manual validation results:

Size: 17 lines
Template: ✗
Linked issue: ✗

Run ID: 16791204550

github-actions · 2025-08-06T23:40:58Z

Manual Validation Results

Status: ❌ Failed

Check	Status	Details
PR Size	✅	17 lines
Template	❌	Missing required sections
Linked Issue	❌	Missing Fixes/Closes #XXX

Errors:

❌ Missing PR template sections: # Description, Fixes #, # How Has This Been Tested?, # Checklist
❌ No linked issue found

View workflow run

The workflow was comparing boolean true to string 'true', causing all validations to incorrectly show as failed even when all checks passed.

- revalidate-all-prs.sh: Triggers manual validation for all open PRs - add-size-labels.sh: Adds size labels (XS/S/M/L/XL) based on change count - add-new-checks.sh: Adds required status checks to branch protection These scripts require maintainer permissions and help manage PR workflows.

- Add type ignore comments for IPython imports - Fix return type annotation (remove unnecessary quotes) - Add _is_jupyter() to properly detect notebook environments - Replace lambda with def function for pylint compliance Fixes google#65

- Add format-check job that checks actual PR code, not merge commit - Validate formatting before expensive fork PR tests - Provide clear error messages when formatting fails Fixes false positives where incorrectly formatted PRs passed CI

Auto-updates PRs behind main, handles forks/conflicts gracefully, skips bot/draft PRs, monitors API limits

github-actions · 2025-08-07T05:29:31Z

⚠️ Branch Update Required

Your branch is 21 commits behind main. Please update your branch to ensure CI checks run with the latest code:

git fetch origin main
git merge origin/main
git push

Note: Enable "Allow edits by maintainers" to allow automatic updates.

- Apply end-of-file and whitespace fixes to workflows

- Fix empty interval bug when newline falls at chunk boundary (issue google#71) - Add concise comment explaining the fix logic - Remove excessive/obvious comments from chunking tests - Improve test docstring to be more descriptive and professional

github-actions · 2025-08-07T12:38:29Z

No linked issues found. Please add the corresponding issues in the pull request description.
Use GitHub automation to close the issue when a PR is merged

github-actions · 2025-08-14T09:24:38Z

⚠️ Branch Update Required

Your branch is 25 commits behind main. Please update your branch to ensure CI checks run with the latest code:

git fetch origin main
git merge origin/main
git push

Note: Enable "Allow edits by maintainers" to allow automatic updates.

github-actions · 2025-08-22T02:32:10Z

⚠️ Branch Update Required

Your branch is 86 commits behind main. Please update your branch to ensure CI checks run with the latest code:

git fetch origin main
git merge origin/main
git push

Note: Enable "Allow edits by maintainers" to allow automatic updates.

github-actions · 2025-08-30T02:27:34Z

⚠️ Branch Update Required

Your branch is 98 commits behind main. Please update your branch to ensure CI checks run with the latest code:

git fetch origin main
git merge origin/main
git push

Note: Enable "Allow edits by maintainers" to allow automatic updates.

github-actions · 2025-09-07T02:30:43Z

⚠️ Branch Update Required

Your branch is 106 commits behind main. Please update your branch to ensure CI checks run with the latest code:

git fetch origin main
git merge origin/main
git push

Note: Enable "Allow edits by maintainers" to allow automatic updates.

github-actions · 2025-09-15T02:31:54Z

⚠️ Branch Update Required

Your branch is 107 commits behind main. Please update your branch to ensure CI checks run with the latest code:

git fetch origin main
git merge origin/main
git push

Note: Enable "Allow edits by maintainers" to allow automatic updates.

github-actions · 2025-09-22T02:32:08Z

⚠️ Branch Update Required

Your branch is 109 commits behind main. Please update your branch to ensure CI checks run with the latest code:

git fetch origin main
git merge origin/main
git push

Note: Enable "Allow edits by maintainers" to allow automatic updates.

github-actions · 2025-09-30T02:28:19Z

⚠️ Branch Update Required

Your branch is 110 commits behind main. Please update your branch to ensure CI checks run with the latest code:

git fetch origin main
git merge origin/main
git push

Note: Enable "Allow edits by maintainers" to allow automatic updates.

github-actions · 2025-10-07T02:28:20Z

⚠️ Branch Update Required

Your branch is 111 commits behind main. Please update your branch to ensure CI checks run with the latest code:

git fetch origin main
git merge origin/main
git push

Note: Enable "Allow edits by maintainers" to allow automatic updates.

aksg87 and others added 11 commits July 22, 2025 01:39

docs(pypi): Improve README display and badge reliability

2ce2399

- Switch from badge.fury.io to shields.io for working PyPI badge - Convert relative paths to absolute GitHub URLs for PyPI compatibility - Bump version to 0.1.3

Fix: Resolve libmagic ImportError (google#6)

e696a48

- Add pylibmagic>=0.5.0 dependency for bundled libraries - Add [full] install option and pre-import handling - Update README with troubleshooting and Docker sections - Bump version to 1.0.1 Fixes google#6

docs: clarify output_dir behavior in medication_examples.md

5447637

Merge pull request google#11 from google/fix/libmagic-dependency-issue

9c47b34

Removed inline comment in medication example

175e075

Deleted an inline comment referencing the output directory in the save_annotated_documents.

Merge pull request google#15 from kleeena/docs/update-medication_exam…

9472099

…ples.md docs: clarify output_dir behavior in medication_examples.md

docs: add output_dir="." to all save_annotated_documents examples

e6c3dcd

Prevents confusion from default `test_output/...` by explicitly saving to current directory.

Merge pull request google#17 from google/fix/output-dir-consistency

1fb1f1d

docs: add output_dir="." to all save_annotated_documents examples

Mirza-Samad-Ahmed-Baig changed the title ~~Fix ollama num threads typo~~ Fix security vulnerability and bugs in Ollama API integration Aug 2, 2025

aksg87 and others added 18 commits August 3, 2025 06:13

build: add formatting & linting pipeline with pre-commit integration

13fbd2c

style: apply pyink, isort, and pre-commit formatting

c8d2027

ci: enable format and lint checks in tox

146a095

Merge pull request google#24 from google/feat/code-formatting-pipeline

aa6da18

feat: add code formatting and linting pipeline

Add LangExtractError base exception for centralized error handling

ed65bca

Introduces a common base exception class that all library-specific exceptions inherit from, enabling users to catch all LangExtract errors with a single except clause.

Merge pull request google#26 from google/feat/exception-hierarchy

6c4508b

Add LangExtractError base exception for centralized error handling

fix: Remove LangFun and pylibmagic dependencies (v1.0.2)

8b85225

Fixes google#25 - Windows installation failure due to pylibmagic build requirements Breaking change: LangFunLanguageModel removed. Use GeminiLanguageModel or OllamaLanguageModel instead.

Merge pull request google#28 from google/fix/remove-breaking-dep-langfun

88520cc

fix: Remove LangFun and pylibmagic dependencies to fix Windows installation and OpenAI SDK v1.x compatibility

Fix save_annotated_documents to handle string paths

75a6f12

- Modified save_annotated_documents to accept both pathlib.Path and string paths - Convert string paths to Path objects before calling mkdir() - This fixes the error when using output_dir='.' as shown in the README example

Merge pull request google#29 from google/fix-save-annotated-documents…

a415b94

…-mkdir Fix save_annotated_documents to handle string paths

feat: Add OpenAI language model support

8289b3a

Merge pull request google#31 from google/feature/add-oai-inference

c8ef723

feat: Add OpenAI language model support

Merge branch 'google:main' into fix-ollama-num-threads-typo

0d76530

Add PR template validation workflow (google#45)

dc61372

Merge branch 'main' into fix-ollama-num-threads-typo

7fc809f

fix: Change OllamaLanguageModel parameter from 'model' to 'model_id' (g…

da771e6

…oogle#57) Fixes google#27

mariano and others added 2 commits August 6, 2025 10:01

Add base_url to OpenAILanguageModel (google#51)

234081e

* Add base_url to OpenAILanguageModel * Github action lint is outdated, so adapting * Adding base_url to parameterized test * Lint fixes to inference_test.py

Add commit status to revalidation workflow

6fb66cf

- Creates visible PR checks (pass/fail status) - Shows validation errors in status description (up to 140 chars) - Links to workflow run for full details - Maintains backward compatibility with comment reporting

Fix boolean comparison in revalidation workflow

47a251e

The workflow was comparing boolean true to string 'true', causing all validations to incorrectly show as failed even when all checks passed.

aksg87 added the size/XS Pull request with less than 50 lines changed label Aug 7, 2025

aksg87 and others added 4 commits August 6, 2025 21:13

Fix CI to validate PR branch formatting directly

e6dcc8e

- Add format-check job that checks actual PR code, not merge commit - Validate formatting before expensive fork PR tests - Provide clear error messages when formatting fails Fixes false positives where incorrectly formatted PRs passed CI

Add PR update automation workflows

1c3c1a2

Auto-updates PRs behind main, handles forks/conflicts gracefully, skips bot/draft PRs, monitors API limits

aksg87 and others added 7 commits August 7, 2025 01:34

Fix workflow formatting

b60f0b2

- Apply end-of-file and whitespace fixes to workflows

Minor changes

f888bd8

Merge branch 'fix-ollama-num-threads-typo'

8659ef3

Fix: Resolve merge conflict and update docstrings in inference.py

82c6644

Changes

ce0caa5

Merge branch 'main' into fix-ollama-num-threads-typo

792fd3e

aksg87 force-pushed the main branch from e36e455 to 3dff0d3 Compare August 21, 2025 01:43

Fix security vulnerability and bugs in Ollama API integration #18

Are you sure you want to change the base?

Fix security vulnerability and bugs in Ollama API integration #18

Uh oh!

Conversation

Mirza-Samad-Ahmed-Baig commented Aug 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Issues Fixed

SSRF Vulnerability (Server-Side Request Forgery)

Typo in Parameter Name

Incorrect Variable in API Call

Double-Slash in URLs

Changes Summary

Example Attack Prevention

Uh oh!

github-actions bot commented Aug 6, 2025

Uh oh!

github-actions bot commented Aug 6, 2025

Uh oh!

github-actions bot commented Aug 6, 2025

Manual Validation Results

Uh oh!

github-actions bot commented Aug 7, 2025

Uh oh!

github-actions bot commented Aug 7, 2025

Uh oh!

github-actions bot commented Aug 14, 2025

Uh oh!

github-actions bot commented Aug 22, 2025

Uh oh!

github-actions bot commented Aug 30, 2025

Uh oh!

github-actions bot commented Sep 7, 2025

Uh oh!

github-actions bot commented Sep 15, 2025

Uh oh!

github-actions bot commented Sep 22, 2025

Uh oh!

github-actions bot commented Sep 30, 2025

Uh oh!

github-actions bot commented Oct 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Mirza-Samad-Ahmed-Baig commented Aug 2, 2025 •

edited

Loading