-
-
Notifications
You must be signed in to change notification settings - Fork 79
Improves OCR UI and adds more options for LLMs #573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- introduces re-OCRing specific pages from UI (partially answers icereed#223) - adds more options for LLM tweaking - allows setting temperature to workaround GPT-5 issue (fixes icereed#448) adds tweaks for OCR: configuring ollama model, showing per page result, allowing re-ocring single page fixes re-ocring additional fixes some more fixes some more fixes some more fixes some more fixes some more fixes some more fixes some more fixes
WalkthroughAdds cancellable OCR jobs, per-page OCR result persistence and re-OCR with per-page cancellation, extends ProcessDocumentOCR to accept jobID and track total pages, introduces OCRPageResult DB model, new HTTP endpoints for job/page control, and new env-driven LLM/Ollama configuration options. UI updated for per-page status and controls. Changes
Sequence Diagram(s)sequenceDiagram
participant UI
participant API as HTTP Server
participant Jobs as Job Worker
participant OCR as OCR Processor
participant DB
UI->>API: POST /api/ocr/jobs (documentID)
API-->>UI: 202 Accepted {job_id}
Jobs->>Jobs: create cancellable context (jobCtx), store cancel
Jobs->>OCR: ProcessDocumentOCR(jobCtx, docID, opts, jobID)
OCR->>DB: Delete old per-page results (pre)
loop per page
OCR->>OCR: check ctx.Done()
OCR->>OCR: call LLM/provider -> OCRResult{text, limitHit, genInfo}
OCR->>DB: SaveSingleOcrPageResult(docID, pageIdx, text, ocrLimitHit, genInfo)
OCR->>Jobs: update pages done/total
end
OCR-->>Jobs: final or partial text
Jobs-->>API: update job status (completed/failed/cancelled)
UI->>API: GET /api/jobs/:id/status
API-->>UI: {status, pages_done, total_pages}
UI->>API: GET /documents/:id/ocr_pages
API-->>UI: per-page results
sequenceDiagram
participant UI
participant API as HTTP Server
participant PageWorker as Per-Page Handler
participant OCR as OCR Provider
participant DB
UI->>API: POST /documents/:id/ocr_pages/:pageIndex/reocr
API->>PageWorker: create cancellable ctx for page, store cancel
PageWorker->>OCR: ProcessImage(page)
OCR-->>PageWorker: OCRResult{text, ocrLimitHit, genInfo}
PageWorker->>DB: SaveSingleOcrPageResult(docID, pageIdx, ...)
PageWorker-->>UI: 200 {text, ocrLimitHit, generationInfo}
UI->>API: DELETE /documents/:id/ocr_pages/:pageIndex/reocr
API-->>PageWorker: invoke cancel -> return 204 or 404
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 14
🧹 Nitpick comments (15)
README.md (3)
539-543
: Clarify provider semantics for new Vision LLM knobsGood additions. Two small clarifications will reduce user confusion:
- VISION_LLM_MAX_TOKENS: Different providers/models interpret “tokens” differently (output cap vs. total). Call this out explicitly for OpenAI vs. Ollama.
- OLLAMA_OCR_TOP_K: Note it only applies when the selected Ollama model supports top-k.
If you agree, consider amending the descriptions to mention provider-specific behavior.
875-879
: Nice troubleshooting guidance; add one guardrailThese tips are practical. Add a short note that increasing OLLAMA_CONTEXT_LENGTH increases KV cache memory usage on the Ollama server, which can cause OOM/restarts for large values on small hosts.
Also applies to: 886-888, 893-894
161-166
: Fix inconsistencies and typos (Option numbering, provider table, and Azure model note, LISTEN_INTERFACE)
- Duplicate numbering in LLM options (two “Option 3” entries).
- Provider Compatibility lists “Mistral OCR” as a provider, but it’s not described elsewhere; likely unintended. Consider removing that row unless you actually support a “mistral” OCR provider.
- Typo: “prebuild-layout” should be “prebuilt-layout”.
- LISTEN_INTERFACE description says “Network interface to listen on” but default is “8080” (a port). If the variable is indeed a port, clarify text (or rename if feasible in future).
Proposed edits:
- # Option 3: Ollama (Local) + # Option 4: Ollama (Local)-| **Mistral OCR** | ✅ | ✅ | ✅ | +| (remove row or document provider) | | | |-| `AZURE_DOCAI_OUTPUT_CONTENT_FORMAT` | Azure Document Intelligence output content format. Optional if using `azure` provider. Defaults to `text`. 'markdown' is the other option and it requires the 'prebuild-layout' model ID. | No | text | +| `AZURE_DOCAI_OUTPUT_CONTENT_FORMAT` | Azure Document Intelligence output content format. Optional if using `azure` provider. Defaults to `text`. 'markdown' is the other option and it requires the 'prebuilt-layout' model ID. | No | text |-| `LISTEN_INTERFACE` | Network interface to listen on. | No | 8080 | +| `LISTEN_INTERFACE` | Address or port to listen on (e.g. `:8080` or `8080`). | No | 8080 |Also applies to: 385-392, 547-547, 567-567
types.go (1)
128-129
: Update comment to include 'whole_pdf' modeThe codebase and README document an additional mode ("whole_pdf"). The comment here should reflect that to avoid drift.
Apply:
- ProcessMode string // OCR processing mode: "image" (default) or "pdf" + ProcessMode string // OCR processing mode: "image" (default), "pdf", or "whole_pdf"jobs.go (2)
134-144
: Bind jobCtx to an app/server root contextUsing context.Background() breaks propagation of server shutdowns/timeouts into jobs. Prefer deriving jobCtx from a root context passed into the worker pool, so Stop/Shutdown can cascade properly.
Example change:
- Pass a root ctx into startWorkerPool and processJob
- Derive jobCtx from that root ctx
-func startWorkerPool(app *App, numWorkers int) { +func startWorkerPool(ctx context.Context, app *App, numWorkers int) { for i := 0; i < numWorkers; i++ { go func(workerID int) { logger.Infof("Worker %d started", workerID) - for job := range jobQueue { + for { + select { + case <-ctx.Done(): + return + case job := <-jobQueue: logger.Infof("Worker %d processing job: %s", workerID, job.ID) - processJob(app, job) + processJob(ctx, app, job) } } }(i) } }-func processJob(app *App, job *Job) { +func processJob(parentCtx context.Context, app *App, job *Job) { jobStore.updateJobStatus(job.ID, "in_progress", "") - jobCtx, cancel := context.WithCancel(context.Background()) + jobCtx, cancel := context.WithCancel(parentCtx) ... }
145-150
: Pre-run cleanup is helpful; consider conditional wipe for partial re-runsDeleting all per-page results at job start is safe but can be wasteful for interrupted jobs or selective re-runs. Long term, consider keeping existing pages and only overwriting as they’re reprocessed.
local_db.go (1)
52-54
: AutoMigrate is fine; indexes are created from struct tagsWith the composite unique tags in OCRPageResult, AutoMigrate will create the unique index. No extra DDL needed.
If you want to be explicit and resilient to tag drift, you can assert the index exists post-migration and log a warning if not.
background_test.go (1)
43-49
: Mock signature update: LGTMThis aligns the mock with the new interface. Good.
Consider updating the unused TestApp.ProcessDocumentOCR helper below to the new signature as well to prevent future confusion:
-func (app *TestApp) ProcessDocumentOCR(ctx context.Context, documentID int, options OCROptions) (*ProcessedDocument, error) { +func (app *TestApp) ProcessDocumentOCR(ctx context.Context, documentID int, options OCROptions, jobID string) (*ProcessedDocument, error) { // Create a simple processed document with mock data mockText := app.ocrProvider.(*mockOCRProvider).text return &ProcessedDocument{ ID: documentID, Text: mockText, }, nil }ocr/llm_provider.go (1)
137-155
: Consider improving token counting accuracy for non-OpenAI modelsThe fallback token counting using
llms.CountTokens
may not be accurate for all models, particularly for Ollama and Mistral models which may use different tokenization schemes. While the code acknowledges this with a comment, it could lead to incorrectOcrLimitHit
flags.Consider adding model-specific token counting or documenting this limitation more prominently:
// Fallback: count tokens using langchaingo (might not be accurate for all models) if tokenCount < 0 { + // Note: This may not be accurate for Ollama/Mistral models + log.Debugf("Using fallback token counting for model %s", p.model) tokenCount = llms.CountTokens(p.model, text) }main.go (1)
238-273
: Consider validating temperature and top-k rangesWhile the code handles parsing errors for the Vision LLM configuration values, it doesn't validate that the values are within acceptable ranges. For example, temperature should typically be between 0.0 and 2.0, and top-k should be positive.
Add range validation for better user feedback:
if tempStr := os.Getenv("VISION_LLM_TEMPERATURE"); tempStr != "" { if parsed, err := strconv.ParseFloat(tempStr, 64); err == nil { + if parsed < 0.0 || parsed > 2.0 { + log.Warnf("VISION_LLM_TEMPERATURE value %f is outside typical range [0.0, 2.0]", parsed) + } visionLlmTemperature = &parsed } else { log.Warnf("Invalid VISION_LLM_TEMPERATURE value: %v, ignoring", err) } } if topKStr := os.Getenv("OLLAMA_OCR_TOP_K"); topKStr != "" { if parsed, err := strconv.Atoi(topKStr); err == nil { + if parsed <= 0 { + log.Warnf("OLLAMA_OCR_TOP_K must be positive, got %d, ignoring", parsed) + } else { ollamaOcrTopK = &parsed + } } else { log.Warnf("Invalid OLLAMA_OCR_TOP_K value: %v, ignoring", err) } }background.go (1)
213-217
: Consider documenting the jobID parameter purposeThe empty string
""
passed as the fourth parameter toProcessDocumentOCR
appears to be a jobID. Consider adding a comment to clarify why an empty jobID is used in the background processing context.Add a clarifying comment:
if app.docProcessor != nil { // Use injected processor if available + // Empty jobID since background processing doesn't track jobs processedDoc, err = app.docProcessor.ProcessDocumentOCR(ctx, document.ID, options, "") } else { // Use the app's own implementation if no processor is injected + // Empty jobID since background processing doesn't track jobs processedDoc, err = app.ProcessDocumentOCR(ctx, document.ID, options, "") }ocr.go (1)
310-321
: Consider handling JSON marshaling errors more gracefully.While the error from JSON marshaling is silently ignored (which is acceptable), consider logging it for debugging purposes, especially since this is generation metadata that might be important for troubleshooting.
var genInfoJSON string if result.GenerationInfo != nil { - if b, err := json.Marshal(result.GenerationInfo); err == nil { + if b, err := json.Marshal(result.GenerationInfo); err != nil { + pageLogger.WithError(err).Debug("Failed to marshal GenerationInfo") + } else { genInfoJSON = string(b) } }web-app/src/ExperimentalOCR.tsx (3)
35-43
: Add error handling for network failures in stopOCRJob.The function catches errors but only sets a generic error message. Consider providing more specific error feedback to help users understand what went wrong.
const stopOCRJob = async () => { if (!jobId) return; try { await axios.post(`/api/ocr/jobs/${jobId}/stop`); setJobStatus('cancelled'); } catch (err) { - setError('Failed to stop OCR job.'); + const errorMessage = axios.isAxiosError(err) + ? `Failed to stop OCR job: ${err.response?.data?.message || err.message}` + : 'Failed to stop OCR job.'; + setError(errorMessage); } };
111-122
: Improve error handling for JSON parsing.The current implementation silently falls back to raw text if JSON parsing fails. Consider logging the parse error for debugging.
let parsedResult: OCRCombinedResult | null = null; try { parsedResult = JSON.parse(response.data.result); } catch (e) { + console.warn('Failed to parse OCR result as JSON, using raw text:', e); setOcrResult(response.data.result); return; }
215-221
: DELETE request error handling could be improved.The cancellation DELETE request failure is only logged to console. Consider showing this error to the user.
try { await axios.delete(`/api/documents/${documentId}/ocr_pages/${pageIdx}/reocr`); console.log(`Cancellation request sent for page ${pageIdx}`); } catch (err) { console.error(`Failed to send cancellation request for page ${pageIdx}:`, err); + setReOcrErrors((prev) => ({ + ...prev, + [pageIdx]: "Failed to cancel Re-OCR request.", + })); }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
go.sum
is excluded by!**/*.sum
📒 Files selected for processing (13)
README.md
(4 hunks)app_http_handlers.go
(4 hunks)background.go
(1 hunks)background_test.go
(1 hunks)jobs.go
(3 hunks)local_db.go
(4 hunks)main.go
(5 hunks)ocr.go
(7 hunks)ocr/llm_provider.go
(4 hunks)ocr/provider.go
(2 hunks)types.go
(2 hunks)web-app/src/ExperimentalOCR.tsx
(8 hunks)web-app/src/ocrStatus.ts
(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (8)
types.go (1)
ocr.go (1)
ProcessedDocument
(20-26)
background_test.go (2)
types.go (1)
OCROptions
(123-129)ocr.go (1)
ProcessedDocument
(20-26)
local_db.go (1)
app_http_handlers.go (1)
OCRPageResult
(280-284)
jobs.go (1)
local_db.go (1)
DeleteOcrPageResults
(153-155)
web-app/src/ExperimentalOCR.tsx (5)
local_db.go (1)
OCRPageResult
(25-34)app_http_handlers.go (1)
OCRPageResult
(280-284)web-app/src/ocrStatus.ts (4)
OCRJobStatus
(1-1)ClientStatus
(2-2)mapJobStatus
(10-21)getStatusViewOptions
(23-44)web-app/src/DocumentProcessor.tsx (1)
Document
(9-15)web-app/e2e/mistral-ocr-processing.spec.ts (1)
displayOCRComparison
(17-88)
app_http_handlers.go (2)
main.go (1)
App
(166-184)local_db.go (3)
GetOcrPageResults
(143-147)OCRPageResult
(25-34)SaveSingleOcrPageResult
(116-141)
ocr.go (4)
main.go (1)
App
(166-184)types.go (1)
OCROptions
(123-129)ocr/provider.go (1)
OCRResult
(14-29)local_db.go (1)
SaveSingleOcrPageResult
(116-141)
ocr/llm_provider.go (1)
ocr/provider.go (1)
OCRResult
(14-29)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: test
🔇 Additional comments (20)
README.md (1)
165-166
: Docs addition for OLLAMA_CONTEXT_LENGTH looks goodClear and actionable. This will help users tune memory vs. context trade-offs.
jobs.go (3)
163-173
: Cancellation handling looks correctChecking jobCtx.Err() to classify user-initiated cancellations is appropriate. Logging and status transitions are clear.
14-21
: Re-OCR cancellers properly registered and cleaned upI’ve confirmed that
reOcrCancellers
is both registered and cleaned up to prevent leaks:
- In
app_http_handlers.go:334–339
, you lock, cancel any existing entry, and store the newcancelReOcr
undercancelKey
.- The deferred cleanup at
app_http_handlers.go:341–345
removes the entry once processing completes.- In the cancellation endpoint (
app_http_handlers.go:400–404
), you lock, delete the entry, and then invoke the cancel func if it existed.No further action needed.
22-33
: TotalPages is already populated in the OCR pipeline – no change neededThe
TotalPages
field is assigned in the OCR routines before any status update:
- ocr.go:198 and ocr.go:254 set
job.TotalPages = totalPdfPages
- The HTTP handler in app_http_handlers.go:195 returns it as
"total_pages"
Since the field is correctly initialized and exposed to the UI, you can ignore the original suggestion.
Likely an incorrect or invalid review comment.
local_db.go (1)
143-147
: LGTM: retrieval ordered by page_indexAscending order matches expected page display in UI.
ocr/provider.go (1)
45-51
: LGTM!The new configuration fields for Vision LLM settings and Ollama OCR-specific settings are well-structured and properly documented.
ocr/llm_provider.go (3)
22-30
: LGTM!The LLMProvider struct fields are properly extended to include the new configuration options for max tokens, temperature, and Ollama-specific top-k setting.
113-123
: LGTM!The dynamic construction of call options based on configuration is well-implemented, properly checking for non-zero maxTokens and non-nil pointers before adding options.
189-196
: LGTM!The Ollama client creation properly handles the optional context length configuration.
app_http_handlers.go (2)
233-244
: LGTM!The stop OCR job handler is well-implemented with proper mutex protection and appropriate HTTP status codes.
334-345
: LGTM!The cancellation mechanism for per-page re-OCR is properly implemented with mutex protection and cleanup in defer.
main.go (1)
899-910
: LGTM!The Ollama LLM client creation properly shares the context length configuration logic with the Vision LLM path.
background.go (2)
224-227
: LGTM!Good defensive programming - properly handling the nil case for
processedDoc
.
236-241
: LGTM!The inline function for conditionally adding OCR complete tags is well-implemented and clear.
ocr.go (3)
44-54
: LGTM! Job tracking is well-integrated.The addition of the
jobID
parameter to the function signature and the conditional logging with the job ID is properly implemented.
267-276
: Good implementation of cancellation handling.The cancellation check with context.Done() is correctly placed before processing each page, and partial results are appropriately returned when cancelled.
195-201
: All jobStore accesses are properly synchronized
- jobStore is initialized at package level (
var jobStore = &JobStore{jobs: make(map[string]*Job)}
).- JobStore embeds a
sync.RWMutex
and wraps every map access:
- addJob, updateJobStatus, updatePagesDone use
Lock()
/Unlock()
.- getJob, GetAllJobs use
RLock()
/RUnlock()
.- The direct map writes in ocr.go (lines 195–201) are already guarded by
jobStore.Lock()
/Unlock()
.No further changes needed.
web-app/src/ocrStatus.ts (1)
1-44
: LGTM! Well-structured status management.The type definitions and status view logic are clean and well-organized. The
getStatusViewOptions
function provides a clear mapping between statuses and UI states.web-app/src/ExperimentalOCR.tsx (2)
8-13
: Excellent implementation of per-page OCR functionality!The per-page OCR results display with re-OCR capabilities is well-structured. The UI properly handles loading states, errors, and provides clear visual feedback with token limit indicators and generation info tooltips.
Also applies to: 28-29, 157-387
105-108
: Race condition in per-page results fetching.The comparison
newPagesDone > lastFetchedPagesDoneRef.current
could lead to missed updates if multiple status checks occur rapidly. Consider using a more robust synchronization mechanism.-if (newPagesDone > lastFetchedPagesDoneRef.current) { +// Always fetch if pages have been processed and we're not at the same count +if (newPagesDone > 0 && newPagesDone !== lastFetchedPagesDoneRef.current) { await fetchPerPageResults(); lastFetchedPagesDoneRef.current = newPagesDone; }Likely an incorrect or invalid review comment.
func (app *App) getOCRPagesHandler(c *gin.Context) { | ||
id := c.Param("id") | ||
parsedID, err := strconv.Atoi(id) | ||
if err != nil { | ||
c.JSON(http.StatusBadRequest, gin.H{"error": "Invalid document ID"}) | ||
return | ||
} | ||
|
||
dbResults, err := GetOcrPageResults(app.Database, parsedID) | ||
if err != nil { | ||
c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to fetch OCR page results"}) | ||
return | ||
} | ||
|
||
type OCRPageResult struct { | ||
Text string `json:"text"` | ||
OcrLimitHit bool `json:"ocrLimitHit"` | ||
GenerationInfo map[string]interface{} `json:"generationInfo,omitempty"` | ||
} | ||
|
||
var pages []OCRPageResult | ||
for _, res := range dbResults { | ||
var genInfo map[string]interface{} | ||
if res.GenerationInfo != "" { | ||
_ = json.Unmarshal([]byte(res.GenerationInfo), &genInfo) | ||
} | ||
pages = append(pages, OCRPageResult{ | ||
Text: res.Text, | ||
OcrLimitHit: res.OcrLimitHit, | ||
GenerationInfo: genInfo, | ||
}) | ||
} | ||
|
||
c.JSON(http.StatusOK, gin.H{ | ||
"pages": pages, | ||
}) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for JSON unmarshaling
The JSON unmarshaling of GenerationInfo
at line 290 ignores errors silently. While this may be intentional for backward compatibility, it could hide data corruption issues.
Add error logging for failed JSON unmarshaling:
var genInfo map[string]interface{}
if res.GenerationInfo != "" {
- _ = json.Unmarshal([]byte(res.GenerationInfo), &genInfo)
+ if err := json.Unmarshal([]byte(res.GenerationInfo), &genInfo); err != nil {
+ log.Warnf("Failed to unmarshal GenerationInfo for doc %d page %d: %v", parsedID, i, err)
+ }
}
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In app_http_handlers.go around lines 266 to 302, the json.Unmarshal call for
res.GenerationInfo ignores errors; update the loop to capture the unmarshaling
error and log it (including enough context such as parsedID and res identifiers)
instead of discarding it. Specifically, change the unmarshal to if err :=
json.Unmarshal([]byte(res.GenerationInfo), &genInfo); err != nil { /* log error
with context using app logger or the standard logger */ } so the handler records
bad/malformed GenerationInfo while continuing to return other page data.
imagePaths, _, err := app.Client.DownloadDocumentAsImages(c.Request.Context(), parsedID, limitOcrPages) | ||
if err != nil || pageIdx < 0 || pageIdx >= len(imagePaths) { | ||
c.JSON(http.StatusBadRequest, gin.H{"error": "Invalid page index or failed to download images"}) | ||
return | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Verify image index bounds before file access
The code downloads images and then checks if pageIdx
is within bounds, but it reads the file before performing adequate validation. While the bounds check exists, the error message combines two different error conditions.
Separate the error conditions for clarity:
imagePaths, _, err := app.Client.DownloadDocumentAsImages(c.Request.Context(), parsedID, limitOcrPages)
- if err != nil || pageIdx < 0 || pageIdx >= len(imagePaths) {
- c.JSON(http.StatusBadRequest, gin.H{"error": "Invalid page index or failed to download images"})
+ if err != nil {
+ c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to download images"})
+ return
+ }
+ if pageIdx < 0 || pageIdx >= len(imagePaths) {
+ c.JSON(http.StatusBadRequest, gin.H{"error": fmt.Sprintf("Invalid page index %d, document has %d pages", pageIdx, len(imagePaths))})
return
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
imagePaths, _, err := app.Client.DownloadDocumentAsImages(c.Request.Context(), parsedID, limitOcrPages) | |
if err != nil || pageIdx < 0 || pageIdx >= len(imagePaths) { | |
c.JSON(http.StatusBadRequest, gin.H{"error": "Invalid page index or failed to download images"}) | |
return | |
} | |
imagePaths, _, err := app.Client.DownloadDocumentAsImages(c.Request.Context(), parsedID, limitOcrPages) | |
if err != nil { | |
c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to download images"}) | |
return | |
} | |
if pageIdx < 0 || pageIdx >= len(imagePaths) { | |
c.JSON(http.StatusBadRequest, gin.H{"error": fmt.Sprintf("Invalid page index %d, document has %d pages", pageIdx, len(imagePaths))}) | |
return | |
} |
🤖 Prompt for AI Agents
In app_http_handlers.go around lines 319 to 323, the code calls
DownloadDocumentAsImages and then treats a combined error condition (download
error or invalid page index) as a single BadRequest; instead, first check and
handle the download error (return 500 or appropriate server error with the
download error message) before inspecting imagePaths, then separately validate
pageIdx against len(imagePaths) and return a clear 400 BadRequest if the index
is out of bounds; ensure you do not reference imagePaths length when err != nil
and return distinct, descriptive error responses for the two failure modes.
var genInfoJSON string | ||
if result.GenerationInfo != nil { | ||
if b, err := json.Marshal(result.GenerationInfo); err == nil { | ||
genInfoJSON = string(b) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for JSON marshaling
The JSON marshaling at line 367 could fail, but errors are silently ignored, which could result in empty genInfoJSON
being saved.
Handle JSON marshaling errors:
var genInfoJSON string
if result.GenerationInfo != nil {
- if b, err := json.Marshal(result.GenerationInfo); err == nil {
+ if b, err := json.Marshal(result.GenerationInfo); err != nil {
+ log.Errorf("Failed to marshal GenerationInfo for doc %d page %d: %v", parsedID, pageIdx, err)
+ } else {
genInfoJSON = string(b)
}
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
var genInfoJSON string | |
if result.GenerationInfo != nil { | |
if b, err := json.Marshal(result.GenerationInfo); err == nil { | |
genInfoJSON = string(b) | |
} | |
} | |
var genInfoJSON string | |
if result.GenerationInfo != nil { | |
if b, err := json.Marshal(result.GenerationInfo); err != nil { | |
log.Errorf("Failed to marshal GenerationInfo for doc %d page %d: %v", parsedID, pageIdx, err) | |
} else { | |
genInfoJSON = string(b) | |
} | |
} |
🤖 Prompt for AI Agents
In app_http_handlers.go around lines 365 to 370, the json.Marshal call for
result.GenerationInfo currently ignores errors which can lead to empty
genInfoJSON; change the code to check the error from json.Marshal, and on error
log the marshaling error with context (including any identifiers like request ID
or generation ID) and return or propagate an appropriate error response (e.g.,
HTTP 500 / internal error) instead of proceeding with an empty string; if
returning is not appropriate in this handler, at minimum set genInfoJSON to a
safe default and log the error so the failure is visible.
type OCRPageResult struct { | ||
ID uint `gorm:"primaryKey"` | ||
DocumentID int `gorm:"index;not null"` | ||
PageIndex int `gorm:"not null"` | ||
Text string `gorm:"size:1048576"` | ||
OcrLimitHit bool | ||
GenerationInfo string `gorm:"type:TEXT"` | ||
CreatedAt time.Time | ||
UpdatedAt time.Time | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add a composite unique index on (document_id, page_index)
Without a uniqueness constraint, concurrent upserts can create duplicate rows. Add a composite unique index so updates are deterministic and performant.
type OCRPageResult struct {
- ID uint `gorm:"primaryKey"`
- DocumentID int `gorm:"index;not null"`
- PageIndex int `gorm:"not null"`
+ ID uint `gorm:"primaryKey"`
+ DocumentID int `gorm:"not null;uniqueIndex:idx_doc_page"`
+ PageIndex int `gorm:"not null;uniqueIndex:idx_doc_page"`
Text string `gorm:"size:1048576"`
OcrLimitHit bool
GenerationInfo string `gorm:"type:TEXT"`
CreatedAt time.Time
UpdatedAt time.Time
}
🤖 Prompt for AI Agents
In local_db.go around lines 25 to 35, the OCRPageResult struct lacks a composite
unique index on (DocumentID, PageIndex); add a composite unique index by
updating the GORM tags for DocumentID and PageIndex to share the same index name
and mark it unique (e.g. index:idx_document_page,unique) so the pair is enforced
as unique, then run or include a DB migration/AutoMigrate to create the new
index to prevent duplicate rows during concurrent upserts.
// SaveSingleOcrPageResult saves or updates the OCR result for a single page, including GenerationInfo as JSON | ||
func SaveSingleOcrPageResult(db *gorm.DB, docID int, pageIdx int, text string, ocrLimitHit bool, generationInfoJSON string) error { | ||
var result OCRPageResult | ||
tx := db.Where("document_id = ? AND page_index = ?", docID, pageIdx).First(&result) | ||
if tx.Error == nil { | ||
result.Text = text | ||
result.OcrLimitHit = ocrLimitHit | ||
result.GenerationInfo = generationInfoJSON | ||
return db.Save(&result).Error | ||
} else if tx.Error != nil { | ||
log.Debugf("SaveSingleOcrPageResult: db.First error: %v (is gorm.ErrRecordNotFound: %v)", tx.Error, errors.Is(tx.Error, gorm.ErrRecordNotFound)) | ||
if errors.Is(tx.Error, gorm.ErrRecordNotFound) { | ||
result = OCRPageResult{ | ||
DocumentID: docID, | ||
PageIndex: pageIdx, | ||
Text: text, | ||
OcrLimitHit: ocrLimitHit, | ||
GenerationInfo: generationInfoJSON, | ||
} | ||
return db.Create(&result).Error | ||
} else { | ||
log.Errorf("Unexpected DB error in SaveSingleOcrPageResult: %v", tx.Error) | ||
return tx.Error | ||
} | ||
} | ||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Replace find-then-save with an atomic upsert
Current flow can race under concurrency and costs an extra query. Prefer a single INSERT ... ON CONFLICT DO UPDATE keyed by (document_id, page_index).
-func SaveSingleOcrPageResult(db *gorm.DB, docID int, pageIdx int, text string, ocrLimitHit bool, generationInfoJSON string) error {
- var result OCRPageResult
- tx := db.Where("document_id = ? AND page_index = ?", docID, pageIdx).First(&result)
- if tx.Error == nil {
- result.Text = text
- result.OcrLimitHit = ocrLimitHit
- result.GenerationInfo = generationInfoJSON
- return db.Save(&result).Error
- } else if tx.Error != nil {
- log.Debugf("SaveSingleOcrPageResult: db.First error: %v (is gorm.ErrRecordNotFound: %v)", tx.Error, errors.Is(tx.Error, gorm.ErrRecordNotFound))
- if errors.Is(tx.Error, gorm.ErrRecordNotFound) {
- result = OCRPageResult{
- DocumentID: docID,
- PageIndex: pageIdx,
- Text: text,
- OcrLimitHit: ocrLimitHit,
- GenerationInfo: generationInfoJSON,
- }
- return db.Create(&result).Error
- } else {
- log.Errorf("Unexpected DB error in SaveSingleOcrPageResult: %v", tx.Error)
- return tx.Error
- }
- }
- return nil
-}
+func SaveSingleOcrPageResult(db *gorm.DB, docID int, pageIdx int, text string, ocrLimitHit bool, generationInfoJSON string) error {
+ rec := OCRPageResult{
+ DocumentID: docID,
+ PageIndex: pageIdx,
+ Text: text,
+ OcrLimitHit: ocrLimitHit,
+ GenerationInfo: generationInfoJSON,
+ }
+ return db.
+ Clauses(clause.OnConflict{
+ Columns: []clause.Column{{Name: "document_id"}, {Name: "page_index"}},
+ DoUpdates: clause.AssignmentColumns([]string{"text", "ocr_limit_hit", "generation_info", "updated_at"}),
+ }).
+ Create(&rec).Error
+}
Add the import needed for clause:
import (
"errors"
...
"gorm.io/gorm/clause"
)
🤖 Prompt for AI Agents
In local_db.go around lines 115 to 141, replace the find-then-save pattern with
an atomic upsert to avoid races and the extra query: add the
"gorm.io/gorm/clause" import, construct the OCRPageResult struct with the
provided fields, and use db.Clauses(clause.OnConflict{Columns:
[]clause.Column{{Name:"document_id"},{Name:"page_index"}}, UpdateAll:
true}).Create(&result) so the row is inserted or updated in a single statement
keyed on (document_id, page_index); remove the prior db.Where(...).First(...)
logic and related branching, returning the Create error if any.
types.go
Outdated
AddTags []string `json:"add_tags,omitempty"` | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
AddTags field: LGTM; verify downstream handling
Adding add_tags aligns with the UI/UX of incremental tagging. Please verify that UpdateDocuments (client and handlers) actually interprets add_tags distinct from suggested_tags and remove_tags to avoid double-application or conflicts.
Run to find usages of add_tags in server handlers and client code:
🏁 Script executed:
#!/bin/bash
rg -n --json 'add_tags' -A 3 -B 3
Length of output: 1890
Implement add_tags
handling in UpdateDocuments
The new add_tags
field is defined in types.go
but isn’t referenced anywhere else (no matches beyond its declaration). As a result, calls to UpdateDocuments
won’t apply added tags. Please:
- In the server handler (e.g. the
UpdateDocuments
method inserver/handlers/...
), readadd_tags
from the incoming payload and merge those tags into the document, guarding against conflicts withsuggested_tags
andremove_tags
. - In the client (
UpdateDocuments
method inclient/...
), include theadd_tags
array when constructing the request. - Add or update unit/integration tests to verify incremental tag additions work as expected.
🤖 Prompt for AI Agents
In types.go around lines 101-102 the AddTags []string
`json:"add_tags,omitempty"` field is declared but never used; update the server
handler (server/handlers/... UpdateDocuments) to read payload.add_tags, merge
those tags into the target document’s tag set (deduplicate), ensure tags present
in remove_tags are not re-added and that add_tags do not override/duplicate
suggested_tags handling (apply a clear precedence: remove_tags wins, then
add_tags, then suggested_tags), and persist the merged tag set; update the
client (client/... UpdateDocuments) to include the add_tags array when building
the request payload; and add/update unit/integration tests that call
UpdateDocuments with combinations of add_tags, remove_tags, and suggested_tags
to assert tags are correctly added, removed, and deduplicated.
ProcessDocumentOCR(ctx context.Context, documentID int, options OCROptions, jobID string) (*ProcessedDocument, error) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Signature change to ProcessDocumentOCR: ensure all implementors/callers updated
This is a public interface change. Please confirm there are no remaining implementations or call sites using the old 3-arg signature.
Use this script to surface function declarations and invocations that may still use the old form:
🏁 Script executed:
#!/bin/bash
# Find all function declarations named ProcessDocumentOCR
rg -n 'func\s+\(.*\)\s+ProcessDocumentOCR\(' -A 2
# Show call sites and arity hints (look for 3 vs 4 args)
rg -n 'ProcessDocumentOCR\(' -A 1 -B 1
Length of output: 2551
Fix remaining TestApp mock to match new signature
The TestApp.ProcessDocumentOCR
override in background_test.go is still using the old 3-arg form. Update it to include the new jobID string
parameter:
• background_test.go:329
- func (app *TestApp) ProcessDocumentOCR(ctx context.Context, documentID int, options OCROptions) (*ProcessedDocument, error) {
+ func (app *TestApp) ProcessDocumentOCR(ctx context.Context, documentID int, options OCROptions, jobID string) (*ProcessedDocument, error) {
Adjust any test logic inside the method to accept or ignore the jobID
as needed.
🤖 Prompt for AI Agents
In types.go around lines 148-149 the ProcessDocumentOCR signature was changed to
include a jobID string; update the TestApp.ProcessDocumentOCR mock in
background_test.go (around line 329) to match that new signature — add the final
jobID string parameter to the function declaration, and update the mock body to
accept or ignore jobID as appropriate (and adjust any references or expectations
in the test to use the new parameter); also ensure any test calls to
TestApp.ProcessDocumentOCR pass a jobID or use a placeholder.
const newJobStatus = mapJobStatus(response.data.status); | ||
setJobStatus(newJobStatus); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle potential mapJobStatus errors gracefully.
Since mapJobStatus
can throw an error for unknown statuses (as noted in the ocrStatus.ts review), this could crash the component. Add error handling here.
try {
const response = await axios.get(`./api/jobs/ocr/${jobId}`);
- const newJobStatus = mapJobStatus(response.data.status);
- setJobStatus(newJobStatus);
+ try {
+ const newJobStatus = mapJobStatus(response.data.status);
+ setJobStatus(newJobStatus);
+ } catch (mapError) {
+ console.error('Failed to map job status:', mapError);
+ setJobStatus('idle');
+ setError('Received unknown job status from server.');
+ return;
+ }
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
const newJobStatus = mapJobStatus(response.data.status); | |
setJobStatus(newJobStatus); | |
const response = await axios.get(`./api/jobs/ocr/${jobId}`); | |
try { | |
const newJobStatus = mapJobStatus(response.data.status); | |
setJobStatus(newJobStatus); | |
} catch (mapError) { | |
console.error('Failed to map job status:', mapError); | |
setJobStatus('idle'); | |
setError('Received unknown job status from server.'); | |
return; | |
} |
🤖 Prompt for AI Agents
In web-app/src/ExperimentalOCR.tsx around lines 99 to 100, the call to
mapJobStatus(response.data.status) can throw for unknown statuses and currently
can crash the component; wrap the mapping in a try/catch, fall back to a safe
status (e.g., 'UNKNOWN' or a defined error state) and call setJobStatus with
that fallback, and log/report the caught error (console.error or an
error-tracking helper) so the component continues rendering without crashing.
if (pageIdx + 1 > lastFetchedPagesDoneRef.current) { | ||
lastFetchedPagesDoneRef.current = pageIdx + 1; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logic issue with page progress tracking.
The condition pageIdx + 1 > lastFetchedPagesDoneRef.current
seems incorrect. Re-OCRing a single page shouldn't necessarily update the overall progress counter which tracks bulk OCR job progress.
This logic appears to mix single-page re-OCR progress with bulk job progress tracking. Consider removing these lines or using a separate tracking mechanism for re-OCR operations:
-if (pageIdx + 1 > lastFetchedPagesDoneRef.current) {
- lastFetchedPagesDoneRef.current = pageIdx + 1;
-}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
if (pageIdx + 1 > lastFetchedPagesDoneRef.current) { | |
lastFetchedPagesDoneRef.current = pageIdx + 1; | |
} |
🤖 Prompt for AI Agents
In web-app/src/ExperimentalOCR.tsx around lines 188-190, the code updates
lastFetchedPagesDoneRef.current during a single-page re-OCR (if (pageIdx + 1 >
lastFetchedPagesDoneRef.current) { lastFetchedPagesDoneRef.current = pageIdx +
1; }), which incorrectly mixes single-page re-OCR progress with bulk OCR job
tracking; remove these two lines or change the logic so lastFetchedPagesDoneRef
is only advanced during bulk/batch OCR flows (e.g., guard the update with an
explicit isBulkOrInitialRun flag or use a separate ref/state for re-OCR
progress) so re-OCR of a single page does not alter the overall bulk progress
counter.
export const mapJobStatus = (raw: string | null | undefined): OCRJobStatus => { | ||
switch (raw) { | ||
case 'pending': | ||
case 'in_progress': | ||
case 'completed': | ||
case 'failed': | ||
case 'cancelled': | ||
return raw; | ||
default: | ||
throw new Error(`Unknown job status: ${raw}`); | ||
} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider returning 'idle' instead of throwing for unknown statuses.
The mapJobStatus
function throws an error for unknown/null/undefined status values. This could cause the UI to crash if the backend returns an unexpected status. Consider returning 'idle' as a safe default instead.
export const mapJobStatus = (raw: string | null | undefined): OCRJobStatus => {
switch (raw) {
case 'pending':
case 'in_progress':
case 'completed':
case 'failed':
case 'cancelled':
return raw;
default:
- throw new Error(`Unknown job status: ${raw}`);
+ console.warn(`Unknown job status: ${raw}, defaulting to 'idle'`);
+ return 'idle';
}
};
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
export const mapJobStatus = (raw: string | null | undefined): OCRJobStatus => { | |
switch (raw) { | |
case 'pending': | |
case 'in_progress': | |
case 'completed': | |
case 'failed': | |
case 'cancelled': | |
return raw; | |
default: | |
throw new Error(`Unknown job status: ${raw}`); | |
} | |
}; | |
export const mapJobStatus = (raw: string | null | undefined): OCRJobStatus => { | |
switch (raw) { | |
case 'pending': | |
case 'in_progress': | |
case 'completed': | |
case 'failed': | |
case 'cancelled': | |
return raw; | |
default: | |
console.warn(`Unknown job status: ${raw}, defaulting to 'idle'`); | |
return 'idle'; | |
} | |
}; |
🤖 Prompt for AI Agents
In web-app/src/ocrStatus.ts around lines 10 to 21, the function currently throws
on unknown/null/undefined statuses which can crash the UI; change the default
behavior so that instead of throwing an error you return the safe default value
'idle' (and explicitly handle null/undefined by returning 'idle' as well) so
unexpected backend values won't bubble up as exceptions.
Hey 👋 |
I've built an image from this and am using it on my Unraid Paperless install. I'm really only using the The image is at Thank you @dawidkulpa for your work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (2)
app_http_handlers.go (2)
355-358
: Log GenerationInfo unmarshal failuresSwallowing the error hides corrupt entries and makes debugging impossible. Please log the failure with context so we know when a page payload is bad.
- if res.GenerationInfo != "" { - _ = json.Unmarshal([]byte(res.GenerationInfo), &genInfo) + if res.GenerationInfo != "" { + if err := json.Unmarshal([]byte(res.GenerationInfo), &genInfo); err != nil { + log.Warnf("Failed to unmarshal GenerationInfo for doc %d page %d: %v", parsedID, res.PageIndex, err) + } }
433-436
: Handle GenerationInfo marshal errorsWe still drop marshal failures silently, which loses diagnostics and stores empty JSON. Log the error and only persist the payload when marshaling succeeds.
- if result.GenerationInfo != nil { - if b, err := json.Marshal(result.GenerationInfo); err == nil { - genInfoJSON = string(b) - } - } + if result.GenerationInfo != nil { + if b, err := json.Marshal(result.GenerationInfo); err != nil { + log.Errorf("Failed to marshal GenerationInfo for doc %d page %d: %v", parsedID, pageIdx, err) + } else { + genInfoJSON = string(b) + } + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
go.sum
is excluded by!**/*.sum
📒 Files selected for processing (5)
README.md
(4 hunks)app_http_handlers.go
(4 hunks)background.go
(1 hunks)main.go
(5 hunks)types.go
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- README.md
🧰 Additional context used
📓 Path-based instructions (1)
**/*.go
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Ensure all Go source files are formatted with gofmt before committing
Files:
types.go
app_http_handlers.go
background.go
main.go
🧠 Learnings (1)
📚 Learning: 2025-09-28T18:55:46.584Z
Learnt from: CR
PR: icereed/paperless-gpt#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-09-28T18:55:46.584Z
Learning: Applies to ocr/**/*.go : Implement OCR providers under the ocr/ package
Applied to files:
app_http_handlers.go
background.go
main.go
🧬 Code graph analysis (3)
types.go (1)
ocr.go (1)
ProcessedDocument
(20-26)
app_http_handlers.go (2)
main.go (1)
App
(110-128)local_db.go (3)
GetOcrPageResults
(143-147)OCRPageResult
(25-34)SaveSingleOcrPageResult
(116-141)
background.go (2)
types.go (1)
DocumentSuggestion
(114-128)web-app/src/DocumentProcessor.tsx (1)
DocumentSuggestion
(36-45)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: test
saveErr := SaveSingleOcrPageResult(app.Database, parsedID, pageIdx, result.Text, result.OcrLimitHit, genInfoJSON) | ||
if saveErr != nil { | ||
log.Errorf("Failed to save re-OCR result for doc %d page %d: %v", parsedID, pageIdx, saveErr) | ||
} | ||
|
||
c.JSON(http.StatusOK, gin.H{ | ||
"text": result.Text, | ||
"ocrLimitHit": result.OcrLimitHit, | ||
"generationInfo": result.GenerationInfo, | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surface DB write failures to the client
If persisting the page result fails we currently return 200, so the UI shows success even though nothing was saved. Please fail fast and tell the client so they can retry.
saveErr := SaveSingleOcrPageResult(app.Database, parsedID, pageIdx, result.Text, result.OcrLimitHit, genInfoJSON)
if saveErr != nil {
log.Errorf("Failed to save re-OCR result for doc %d page %d: %v", parsedID, pageIdx, saveErr)
- }
-
- c.JSON(http.StatusOK, gin.H{
+ c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to persist re-OCR result"})
+ return
+ }
+
+ c.JSON(http.StatusOK, gin.H{
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
saveErr := SaveSingleOcrPageResult(app.Database, parsedID, pageIdx, result.Text, result.OcrLimitHit, genInfoJSON) | |
if saveErr != nil { | |
log.Errorf("Failed to save re-OCR result for doc %d page %d: %v", parsedID, pageIdx, saveErr) | |
} | |
c.JSON(http.StatusOK, gin.H{ | |
"text": result.Text, | |
"ocrLimitHit": result.OcrLimitHit, | |
"generationInfo": result.GenerationInfo, | |
}) | |
saveErr := SaveSingleOcrPageResult(app.Database, parsedID, pageIdx, result.Text, result.OcrLimitHit, genInfoJSON) | |
if saveErr != nil { | |
log.Errorf("Failed to save re-OCR result for doc %d page %d: %v", parsedID, pageIdx, saveErr) | |
c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to persist re-OCR result"}) | |
return | |
} | |
c.JSON(http.StatusOK, gin.H{ | |
"text": result.Text, | |
"ocrLimitHit": result.OcrLimitHit, | |
"generationInfo": result.GenerationInfo, | |
}) |
🤖 Prompt for AI Agents
In app_http_handlers.go around lines 438-447 the handler logs
SaveSingleOcrPageResult failures but still returns HTTP 200; change the control
flow to return an error response when saveErr != nil by responding with a
non-200 status (e.g., http.StatusInternalServerError or http.StatusBadGateway)
and a concise JSON error body (message field), then immediately return so the
client sees the failure and can retry; do not remove the existing log, but
ensure the error response does not leak sensitive internals.
Thank you all for your patience! |
Hey @icereed !
I'm happy to see this project still getting new features and overall being maintained! :)
I've been adding small improvements to the project for my own use, and after some time using it, I decided to open a PR with them as I've noticed people opening feature requests for some of them :)
Hopefully you can find these changes going inline with your idea for the project.
The changes I'm introducing:
If the OCR is requested via UI, you can now see live progress of it, page by page.
You can also at any time press the "Re-OCR Page" button to request re-ocr of that page.
When the job finishes, the entire content is shown in the last box, ready to be saved to paperless.
You can now set
VISION_LLM_MAX_TOKENS
, and it'll be sent to the LLM provider, limiting the output's length. It's also helpful if your LLM tends to get into repeating itself cycles, as if the MAX_TOKENS limit is hit, you'll get a message on the UI(so you don't have to check every single page for that, for larger jobs), and can just click "Re-OCR" straight away.Context Length and Top K options for Ollama deployments
You can now set
OLLAMA_CONTEXT_LENGTH
andOLLAMA_OCR_TOP_K
to tweak your local models. Especially the context length option is useful, as you can control how much of memory is reserved for your runs.Temperature
The last of the new options is temperature. This is useful in many cases(like making your model less creative), but the most important use case for it are GPT-5 family models deployments. As currently if you try to use GPT-5, you always get an error that the only supported temperature is 1.0. This addresses 400: Unsupported value: 'temperature' does not support 0 with this model #448
Let me know what you think :)
Summary by CodeRabbit
New Features
Improvements
Documentation