-
-
Notifications
You must be signed in to change notification settings - Fork 79
Fixed docling serve API version to v1 & added two further environment variables #614
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Added two environment variables`DOCLING_OCR_ENGINE` and `DOCLING_OCR_PIPELINE` Updated README.md
WalkthroughAdds Docling OCR configuration via env vars and Config fields, applies defaults and validation, wires them into provider initialization and request payloads (conditionally including Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor U as User
participant App as Application
participant P as DoclingProvider
participant D as Docling API
rect rgba(245,250,255,0.6)
note over App: Env resolution & provider init
U->>App: Start with env vars
App->>App: read DOCLING_OCR_PIPELINE, DOCLING_OCR_ENGINE
App->>App: default pipeline="vlm" if unset
App->>App: if pipeline=="standard" and engine unset -> engine="easyocr"
App->>P: initialize provider with pipeline & engine
end
rect rgba(235,255,235,0.6)
note over P,D: OCR processing flow
U->>App: Send image for OCR
App->>P: ProcessImage(image)
P->>D: POST /v1/convert/file
note right of P: form: pipeline=<cfg>\n[ocr_engine=<cfg> if pipeline=="standard"]
D-->>P: OCR response
P-->>App: parsed result
App-->>U: OCR result
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Refers to issue #603 . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (7)
ocr/provider.go (1)
59-61
: Config fields added: good; consider package-level defaults/validation.If consumers instantiate Config outside main.go, DoclingOCRPipeline/DoclingOCREngine may be empty, leading to empty form fields. Add sane defaults and minimal validation in newDoclingProvider to keep ocr package robust independent of callers.
README.md (1)
555-556
: Tighten wording/capitalization for new Docling env vars; keep table consistent.
- Capitalize “OCR” and clarify when the engine default applies.
- Also update the OCR_PROVIDER row to include 'docling' and 'mistral_ocr' for consistency with the rest of the doc.
Apply:
-| `DOCLING_OCR_PIPELINE` | Sets the pipeline type. Optional; defaults to `vlm` if unset. | No | vlm | -| `DOCLING_OCR_ENGINE` | Sets the ocr engine, if `DOCLING_OCR_PIPELINE` is set to `standard`. Optional; defaults to `easyocr` | No | easyocr | +| `DOCLING_OCR_PIPELINE` | Sets the pipeline type. Optional; defaults to `vlm` if unset. | No | vlm | +| `DOCLING_OCR_ENGINE` | Sets the OCR engine when `DOCLING_OCR_PIPELINE=standard`. Optional; defaults to `easyocr` in that case. | No | easyocr |main.go (1)
545-552
: Defaulting looks good; optionally validate allowed values.Consider rejecting unknown DOCLING_OCR_PIPELINE values early (e.g., allow only “vlm” or “standard”) and, when “standard”, validate ocr engine against known engines.
ocr/docling_provider_test.go (1)
71-82
: Add assertions for pipeline (and engine when applicable).To guard regressions, assert the
pipeline
form field (e.g., “vlm”), andocr_engine
only when pipeline is “standard”. You can set these via the provider ctor or assign on the test provider.Example:
func newTestDoclingProvider(serverURL string) *DoclingProvider { client := retryablehttp.NewClient() client.RetryMax = 0 client.Logger = nil return &DoclingProvider{ baseURL: serverURL, - imageExportMode: "md", + imageExportMode: "md", + pipeline: "vlm", httpClient: client, } } // inside checkRequest: assert.Equal(t, "vlm", r.FormValue("pipeline")) // and only for a “standard” pipeline test case: // assert.Equal(t, "easyocr", r.FormValue("ocr_engine"))ocr/docling_provider.go (3)
33-38
: Set HTTP client timeout.Add a sane request timeout to avoid hangs under network issues.
client := retryablehttp.NewClient() client.RetryMax = 3 client.RetryWaitMin = 1 * time.Second client.RetryWaitMax = 10 * time.Second client.Logger = logger +if client.HTTPClient != nil { + client.HTTPClient.Timeout = 60 * time.Second +}
115-121
: Logocr_engine
only when relevant.Minor noise reduction: include
ocr_engine
in logs only ifpipeline=="standard"
.logger.WithFields(logrus.Fields{ "to_formats": "md", "do_ocr": "true", "pipeline": p.pipeline, - "ocr_engine": p.ocrEngine, "image_export_mode": p.imageExportMode, }).Debug("Docling request parameters") +if p.pipeline == "standard" { + logger.WithField("ocr_engine", p.ocrEngine).Debug("Docling request parameter") +}
190-195
: Comment still mentions /v1alpha.Update to reference /v1 to match the new endpoint.
-// DoclingConvertResponse mirrors the structure of the /v1alpha/convert/file JSON response +// DoclingConvertResponse mirrors the structure of the /v1/convert/file JSON response
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (5)
README.md
(1 hunks)main.go
(2 hunks)ocr/docling_provider.go
(5 hunks)ocr/docling_provider_test.go
(1 hunks)ocr/provider.go
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
main.go (1)
web-app/e2e/test-environment.ts (1)
TestEnvironmentConfig
(23-26)
🪛 LanguageTool
README.md
[grammar] ~555-~555: There might be a mistake here.
Context: ... | No | vlm | | DOCLING_OCR_ENGINE
| S...
(QB_NEW_EN)
🔇 Additional comments (3)
ocr/docling_provider_test.go (1)
48-48
: API path update to /v1 LGTM.ocr/docling_provider.go (2)
84-91
: Avoid sending emptypipeline
/ocr_engine
.With the defaults above, this is addressed; otherwise Docling may reject the request.
102-104
: Endpoint bump to /v1 LGTM.
provider := &DoclingProvider{ | ||
baseURL: config.DoclingURL, | ||
imageExportMode: config.DoclingImageExportMode, | ||
pipeline: config.DoclingOCRPipeline, | ||
ocrEngine: config.DoclingOCREngine, | ||
httpClient: client, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Default/validate pipeline and engine inside provider.
Protect against empty values when callers omit them. This keeps the ocr package self-contained.
provider := &DoclingProvider{
baseURL: config.DoclingURL,
imageExportMode: config.DoclingImageExportMode,
- pipeline: config.DoclingOCRPipeline,
- ocrEngine: config.DoclingOCREngine,
+ pipeline: config.DoclingOCRPipeline,
+ ocrEngine: config.DoclingOCREngine,
httpClient: client,
}
+
+// Fallbacks and minimal validation
+if provider.pipeline == "" {
+ provider.pipeline = "vlm"
+}
+if provider.pipeline == "standard" && provider.ocrEngine == "" {
+ provider.ocrEngine = "easyocr"
+}
+if provider.pipeline != "vlm" && provider.pipeline != "standard" {
+ return nil, fmt.Errorf("unsupported docling pipeline: %q (supported: vlm, standard)", provider.pipeline)
+}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
provider := &DoclingProvider{ | |
baseURL: config.DoclingURL, | |
imageExportMode: config.DoclingImageExportMode, | |
pipeline: config.DoclingOCRPipeline, | |
ocrEngine: config.DoclingOCREngine, | |
httpClient: client, | |
} | |
provider := &DoclingProvider{ | |
baseURL: config.DoclingURL, | |
imageExportMode: config.DoclingImageExportMode, | |
pipeline: config.DoclingOCRPipeline, | |
ocrEngine: config.DoclingOCREngine, | |
httpClient: client, | |
} | |
// Fallbacks and minimal validation | |
if provider.pipeline == "" { | |
provider.pipeline = "vlm" | |
} | |
if provider.pipeline == "standard" && provider.ocrEngine == "" { | |
provider.ocrEngine = "easyocr" | |
} | |
if provider.pipeline != "vlm" && provider.pipeline != "standard" { | |
return nil, fmt.Errorf( | |
"unsupported docling pipeline: %q (supported: vlm, standard)", | |
provider.pipeline, | |
) | |
} |
🤖 Prompt for AI Agents
In ocr/docling_provider.go around lines 39 to 45, the provider is created
directly from config without validating or defaulting pipeline/engine; update
the constructor to check if config.DoclingOCRPipeline or config.DoclingOCREngine
are empty and replace them with explicit fallback constants (e.g. a named
default pipeline and default engine), or return an error if missing; ensure
defaults live near the provider code (constants) and that the provider fields
are set to the validated/defaulted values so callers can omit those config
fields safely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point.
However, I would recommend creating an interface for ALL OCR engines to with a validate
method, as I believe it would simplify the code.
We could then use the interface to add more methods to make it easier to add upcoming OCR methods.
Currently, the validation starts in the provider.go
file, function NewProvider
. I added the the validation for docling in the same function to keep validation concise.
This should be another issue and not be considered in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@T-Eberle You make an excellent architectural point! A unified validation interface across all OCR providers would indeed be much cleaner and more maintainable than adding validation piecemeal to individual providers.
You're absolutely right that this broader refactoring should be a separate issue rather than expanding the scope of this PR, which is focused on the Docling API version update and new environment variables.
Would you like me to create a follow-up issue to track the OCR provider validation interface improvement? I can outline the proposal for:
- Creating a common validation interface for all OCR providers
- Moving validation logic from
provider.go
'sNewProvider
into individual provider implementations - Establishing consistent patterns for future OCR provider additions
This would ensure the architectural improvement gets proper attention and discussion without delaying your current changes.
🐰
💭 thinking about clean architecture
|
🔧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. See #624
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@T-Eberle Perfect! Thank you for creating the follow-up issue #624 to track the OCR provider validation interface improvement. This keeps your current PR focused while ensuring the architectural enhancement gets proper attention later.
Your approach of maintaining separation of concerns is spot-on - the Docling API updates in this PR are clean and straightforward without getting mixed up with broader refactoring work.
If you found this review helpful, would you consider giving us a shout-out on X?
Thank you for using CodeRabbit!
Added further documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
ocr/provider.go (1)
103-111
: Normalize pipeline/engine before defaulting and validatingAvoid case/whitespace surprises from env vars; normalize once, then apply defaults and validation.
Apply within this block:
@@ - if config.DoclingOCRPipeline == "" { - config.DoclingOCRPipeline = "vlm" - } - if config.DoclingOCRPipeline == "standard" && config.DoclingOCREngine == "" { - config.DoclingOCREngine = "easyocr" - } - if config.DoclingOCRPipeline != "vlm" && config.DoclingOCRPipeline != "standard" { - return nil, fmt.Errorf("unsupported docling pipeline: %q (supported: vlm, standard)", config.DoclingOCRPipeline) - } + // Normalize to avoid case/whitespace surprises from env/config + config.DoclingOCRPipeline = strings.ToLower(strings.TrimSpace(config.DoclingOCRPipeline)) + config.DoclingOCREngine = strings.ToLower(strings.TrimSpace(config.DoclingOCREngine)) + if config.DoclingOCRPipeline == "" { + config.DoclingOCRPipeline = "vlm" + } + if config.DoclingOCRPipeline == "standard" && config.DoclingOCREngine == "" { + config.DoclingOCREngine = "easyocr" + } + if config.DoclingOCRPipeline != "vlm" && config.DoclingOCRPipeline != "standard" { + return nil, fmt.Errorf("unsupported docling pipeline: %q (supported: vlm, standard)", config.DoclingOCRPipeline) + }Also add:
- import:
strings
main.go (2)
547-554
: Normalize env inputs; consider single source of truth for defaultsNormalize before checks; keep defaults here or rely solely on ocr.NewProvider to avoid drift.
@@ - if doclingOCRPipeline == "" { + doclingOCRPipeline = strings.ToLower(strings.TrimSpace(doclingOCRPipeline)) + doclingOCREngine = strings.ToLower(strings.TrimSpace(doclingOCREngine)) + if doclingOCRPipeline == "" { doclingOCRPipeline = "vlm" log.Infof("DOCLING_OCR_PIPELINE not set, defaulting to %s", doclingOCRPipeline) } - if doclingOCRPipeline == "standard" && doclingOCREngine == "" { + if doclingOCRPipeline == "standard" && doclingOCREngine == "" { doclingOCREngine = "easyocr" log.Infof("DOCLING_OCR_ENGINE not set, defaulting to %s", doclingOCREngine) }Optionally: drop these defaults here and rely on provider.NewProvider for a single source of truth.
544-546
: Comment nit: mismatch with actual defaultThe comment says “Default to PNG” but the value is "embedded".
- doclingImageExportMode = "embedded" // Default to PNG + doclingImageExportMode = "embedded" // Default to "embedded"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
main.go
(3 hunks)ocr/provider.go
(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
ocr/provider.go (1)
ocr/docling_provider.go (1)
newDoclingProvider
(25-45)
main.go (1)
ocr.go (1)
ProcessDocumentOCR
(41-377)
🔇 Additional comments (3)
ocr/provider.go (1)
59-61
: Config surface extension: LGTMNew public fields are clearly named and documented.
main.go (2)
77-79
: Env wiring added: LGTMReads DOCLING_OCR_PIPELINE and DOCLING_OCR_ENGINE.
225-226
: Correctly passed through to OCR configValues are now forwarded to ocr.Config.
|
Added further documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
ocr/provider.go (4)
100-106
: Trim DOCLING_URL before emptiness check.Avoid accepting a URL that’s only whitespace.
Apply:
- if config.DoclingURL == "" { + config.DoclingURL = strings.TrimSpace(config.DoclingURL) + if config.DoclingURL == "" { return nil, fmt.Errorf("missing required Docling configuration (DOCLING_URL)") }
104-114
: Case-insensitive validation, preserve original casing.Author opted not to lowercase fields. Validate against a lowercased copy to accept inputs like "VLM"/"Standard" while preserving original values passed downstream.
Apply:
- config.DoclingOCRPipeline = strings.TrimSpace(config.DoclingOCRPipeline) - config.DoclingOCREngine = strings.TrimSpace(config.DoclingOCREngine) - if config.DoclingOCRPipeline == "" { - config.DoclingOCRPipeline = "vlm" - } - if config.DoclingOCRPipeline == "standard" && config.DoclingOCREngine == "" { - config.DoclingOCREngine = "easyocr" - } - if config.DoclingOCRPipeline != "vlm" && config.DoclingOCRPipeline != "standard" { - return nil, fmt.Errorf("unsupported docling pipeline: %q (supported: vlm, standard)", config.DoclingOCRPipeline) - } + config.DoclingOCRPipeline = strings.TrimSpace(config.DoclingOCRPipeline) + if config.DoclingOCRPipeline == "" { + config.DoclingOCRPipeline = "vlm" + } + pipelineCanon := strings.ToLower(config.DoclingOCRPipeline) + config.DoclingOCREngine = strings.TrimSpace(config.DoclingOCREngine) + if pipelineCanon == "standard" && config.DoclingOCREngine == "" { + config.DoclingOCREngine = "easyocr" + } + if pipelineCanon != "vlm" && pipelineCanon != "standard" { + return nil, fmt.Errorf("unsupported docling pipeline: %q (supported: vlm, standard)", config.DoclingOCRPipeline) + }
115-116
: Log resolved pipeline/engine for observability.Makes debugging config issues easier.
Apply:
- log.WithField("url", config.DoclingURL).Info("Using Docling provider") + log.WithFields(logrus.Fields{ + "url": config.DoclingURL, + "pipeline": config.DoclingOCRPipeline, + "engine": config.DoclingOCREngine, + }).Info("Using Docling provider")
60-61
: Centralize Docling OCR defaults in NewProvidermain.go currently defaults DOCLING_OCR_PIPELINE/DOCLING_OCR_ENGINE (main.go:548–553) while NewProvider also applies the same defaults (ocr/provider.go:104–111). Remove the defaulting from main.go so NewProvider is the single source of truth (have main.go pass raw env values) and add logging of the resolved pipeline and engine in NewProvider for diagnosability.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
README.md
(3 hunks)ocr/provider.go
(3 hunks)
✅ Files skipped from review due to trivial changes (1)
- README.md
🧰 Additional context used
🧬 Code graph analysis (1)
ocr/provider.go (1)
ocr/docling_provider.go (1)
newDoclingProvider
(25-45)
🔇 Additional comments (1)
ocr/provider.go (1)
6-6
: LGTM on importing strings.Needed for whitespace normalization.
README.md
Outdated
# DOCLING_URL: 'http://your-docling-server:port' # URL of your Docling instance | ||
# DOCLING_IMAGE_EXPORT_MODE: "placeholder" # Optional, defaults to "embedded" | ||
# DOCLING_OCR_PIPELINE: "standard" # Optional, defaults to "vlm" | ||
# DOCLING_OCR_ENGINE: "easyocr" # Optional, defaults to "easyocr" if DOCLING_OCR_PIPELINE is set to "vlm" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment here and on line 367 is incorrect; should be something like
# Optional, defaults to "easyocr" (only used when `DOCLING_OCR_PIPELINE=standard`)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review @chetan. I corrected the lines.
…DOCLING_OCR_ENGINE default behavior
v1alpha
tov1
.DOCLING_OCR_PIPELINE
to configure the pipeline type (default staysvlm
)DOCLING_OCR_ENGINE
to configure the OCR engine, if the pipeline type isstandard
Summary by CodeRabbit
New Features
Documentation
Chores