Improves OCR UI and adds more options for LLMs #573

dawidkulpa · 2025-08-13T15:55:20Z

Hey @icereed !
I'm happy to see this project still getting new features and overall being maintained! :)
I've been adding small improvements to the project for my own use, and after some time using it, I decided to open a PR with them as I've noticed people opening feature requests for some of them :)
Hopefully you can find these changes going inline with your idea for the project.

The changes I'm introducing:

Re-OCRing specific pages from UI (partially answers [Feature Request] Retry OCR for specific pages #223 , but requires the OCR to be requested entirely via UI)
If the OCR is requested via UI, you can now see live progress of it, page by page.
You can also at any time press the "Re-OCR Page" button to request re-ocr of that page.
When the job finishes, the entire content is shown in the last box, ready to be saved to paperless.

Token stats for each of the pages(if available)

Max Tokens option for LLMs
You can now set VISION_LLM_MAX_TOKENS, and it'll be sent to the LLM provider, limiting the output's length. It's also helpful if your LLM tends to get into repeating itself cycles, as if the MAX_TOKENS limit is hit, you'll get a message on the UI(so you don't have to check every single page for that, for larger jobs), and can just click "Re-OCR" straight away.

Context Length and Top K options for Ollama deployments
You can now set OLLAMA_CONTEXT_LENGTH and OLLAMA_OCR_TOP_K to tweak your local models. Especially the context length option is useful, as you can control how much of memory is reserved for your runs.
Temperature
The last of the new options is temperature. This is useful in many cases(like making your model less creative), but the most important use case for it are GPT-5 family models deployments. As currently if you try to use GPT-5, you always get an error that the only supported temperature is 1.0. This addresses 400: Unsupported value: 'temperature' does not support 0 with this model #448

Let me know what you think :)

Summary by CodeRabbit

New Features
- Cancel running OCR jobs; cancel per-page re-OCR.
- View and persist per-page OCR results with token-limit badges and generation metadata.
- Re-run OCR for individual pages and see live pages-processed/total progress.
Improvements
- UI: per-page controls, tooltips for generation info, enhanced status views and save flow.
- API: new endpoints for job control and per-page OCR; job status now reports total pages.
- Config: env vars for Ollama context and Vision LLM tuning respected with graceful fallbacks.
Documentation
- Expanded README with Ollama context management, Vision LLM tuning, examples, and troubleshooting.

- introduces re-OCRing specific pages from UI (partially answers icereed#223) - adds more options for LLM tweaking - allows setting temperature to workaround GPT-5 issue (fixes icereed#448) adds tweaks for OCR: configuring ollama model, showing per page result, allowing re-ocring single page fixes re-ocring additional fixes some more fixes some more fixes some more fixes some more fixes some more fixes some more fixes some more fixes

coderabbitai · 2025-08-13T15:55:28Z

Walkthrough

Adds cancellable OCR jobs, per-page OCR result persistence and re-OCR with per-page cancellation, extends ProcessDocumentOCR to accept jobID and track total pages, introduces OCRPageResult DB model, new HTTP endpoints for job/page control, and new env-driven LLM/Ollama configuration options. UI updated for per-page status and controls.

Changes

Cohort / File(s)	Summary
Documentation `README.md`	Documents Ollama context handling and Vision LLM tuning; adds env vars: `OLLAMA_CONTEXT_LENGTH`, `VISION_LLM_MAX_TOKENS`, `VISION_LLM_TEMPERATURE`, `OLLAMA_OCR_TOP_K`; updates examples and troubleshooting.
HTTP API: OCR control & per-page OCR `app_http_handlers.go`, `main.go`	Adds endpoints: `GET /documents/:id/ocr_pages`, `POST /documents/:id/ocr_pages/:pageIndex/reocr`, `DELETE /documents/:id/ocr_pages/:pageIndex/reocr`, `POST /ocr/jobs/:job_id/stop`; handlers implement job and per-page cancellation, fetching per-page results, re-OCR flow and cancellation responses (including 204/404/499 handling).
Background processing & interfaces `background.go`, `types.go`, `background_test.go`	Extends `ProcessDocumentOCR` signature to accept `jobID string`; adds `DocumentSuggestion.AddTags`; adds nil-safety guard for processedDoc; updates tests/mocks to new signature.
Job lifecycle & cancellation `jobs.go`	Introduces per-job cancellers with mutex maps, creates cancellable job contexts, deletes old per-page results before OCR runs, propagates jobID to OCR, tracks `TotalPages` on `Job`, and handles cancelled/failed/completed states.
DB schema: per-page OCR `local_db.go`	Adds `OCRPageResult` model and migrations; implements upsert/save single page result, retrieval, update alias, and delete-all per-document; stores `GenerationInfo` JSON and `OcrLimitHit`.
OCR core & provider changes `ocr.go`, `ocr/provider.go`, `ocr/llm_provider.go`	`ProcessDocumentOCR(ctx, documentID, options, jobID)` tracks total pages, supports ctx cancellation, accumulates per-page `OCRResult`s, marshals and saves `GenerationInfo`, persists per-page results, and returns partial results on cancel. Provider/LLM support added fields: `VisionLLMMaxTokens`, `VisionLLMTemperature`, `OllamaOcrTopK`, `OllamaContextLength`; `OCRResult` now includes `OcrLimitHit` and `GenerationInfo`.
Config and app bootstrap `main.go`	Parses new env vars (with warnings on invalid values) into OCR config fields and applies Ollama context-length option when set; wires new OCR endpoints into routes.
Web UI: Experimental OCR `web-app/src/ExperimentalOCR.tsx`, `web-app/src/ocrStatus.ts`	UI now tracks job/client statuses, fetches and displays per-page results, supports stop job and per-page re-OCR with cancel, shows token-limit badges and generation info tooltips; adds `OCRJobStatus`, `ClientStatus`, `mapJobStatus`, and `getStatusViewOptions`.

Sequence Diagram(s)

sequenceDiagram
  participant UI
  participant API as HTTP Server
  participant Jobs as Job Worker
  participant OCR as OCR Processor
  participant DB

  UI->>API: POST /api/ocr/jobs (documentID)
  API-->>UI: 202 Accepted {job_id}
  Jobs->>Jobs: create cancellable context (jobCtx), store cancel
  Jobs->>OCR: ProcessDocumentOCR(jobCtx, docID, opts, jobID)
  OCR->>DB: Delete old per-page results (pre)
  loop per page
    OCR->>OCR: check ctx.Done()
    OCR->>OCR: call LLM/provider -> OCRResult{text, limitHit, genInfo}
    OCR->>DB: SaveSingleOcrPageResult(docID, pageIdx, text, ocrLimitHit, genInfo)
    OCR->>Jobs: update pages done/total
  end
  OCR-->>Jobs: final or partial text
  Jobs-->>API: update job status (completed/failed/cancelled)
  UI->>API: GET /api/jobs/:id/status
  API-->>UI: {status, pages_done, total_pages}
  UI->>API: GET /documents/:id/ocr_pages
  API-->>UI: per-page results

sequenceDiagram
  participant UI
  participant API as HTTP Server
  participant PageWorker as Per-Page Handler
  participant OCR as OCR Provider
  participant DB

  UI->>API: POST /documents/:id/ocr_pages/:pageIndex/reocr
  API->>PageWorker: create cancellable ctx for page, store cancel
  PageWorker->>OCR: ProcessImage(page)
  OCR-->>PageWorker: OCRResult{text, ocrLimitHit, genInfo}
  PageWorker->>DB: SaveSingleOcrPageResult(docID, pageIdx, ...)
  PageWorker-->>UI: 200 {text, ocrLimitHit, generationInfo}
  UI->>API: DELETE /documents/:id/ocr_pages/:pageIndex/reocr
  API-->>PageWorker: invoke cancel -> return 204 or 404

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Add RemoveTags field to DocumentSuggestion for explicit tag removal #118 — touches DocumentSuggestion changes; overlaps with added AddTags and tagging logic.
Enhance logging #114 — modifies ProcessDocumentOCR logging/flow; closely related to the signature and behavior changes here.
Fix OCR completion tag and adds OCR document processing E2E test #379 — overlaps on OCR background/tagging and DocumentProcessor integration affected by these changes.

Poem

A rabbit taps the job to go, hop-hop then pause,
Pages saved like crunchy leaves, one-by-one because.
Tokens nibble at the edge, Top-K carrot bright,
Ollama threads a cozy burrow, context snug at night.
Per-page hops and gentle stops — OCR all done, delight! 🐇

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title concisely summarizes the primary contributions of this pull request, namely enhancements to the OCR user interface and the addition of new environment variables to configure LLM behavior. It clearly reflects both the UI changes (re-OCR controls, per-page progress and result displays) and the expanded LLM options (max tokens, context length, temperature, top-k). This phrasing gives teammate readers a clear, high-level understanding without unnecessary detail.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 14

🧹 Nitpick comments (15)

README.md (3)
539-543: Clarify provider semantics for new Vision LLM knobs

Good additions. Two small clarifications will reduce user confusion:

VISION_LLM_MAX_TOKENS: Different providers/models interpret “tokens” differently (output cap vs. total). Call this out explicitly for OpenAI vs. Ollama.

OLLAMA_OCR_TOP_K: Note it only applies when the selected Ollama model supports top-k.

If you agree, consider amending the descriptions to mention provider-specific behavior.

875-879: Nice troubleshooting guidance; add one guardrail

These tips are practical. Add a short note that increasing OLLAMA_CONTEXT_LENGTH increases KV cache memory usage on the Ollama server, which can cause OOM/restarts for large values on small hosts.

Also applies to: 886-888, 893-894

161-166: Fix inconsistencies and typos (Option numbering, provider table, and Azure model note, LISTEN_INTERFACE)

Duplicate numbering in LLM options (two “Option 3” entries).

Provider Compatibility lists “Mistral OCR” as a provider, but it’s not described elsewhere; likely unintended. Consider removing that row unless you actually support a “mistral” OCR provider.

Typo: “prebuild-layout” should be “prebuilt-layout”.

LISTEN_INTERFACE description says “Network interface to listen on” but default is “8080” (a port). If the variable is indeed a port, clarify text (or rename if feasible in future).

Proposed edits:
-      # Option 3: Ollama (Local)
+      # Option 4: Ollama (Local)
-| **Mistral OCR** | ✅ | ✅ | ✅ |
+| (remove row or document provider) |   |   |   |
-| `AZURE_DOCAI_OUTPUT_CONTENT_FORMAT` | Azure Document Intelligence output content format. Optional if using `azure` provider. Defaults to `text`. 'markdown' is the other option and it requires the 'prebuild-layout' model ID.     | No       | text                       |
+| `AZURE_DOCAI_OUTPUT_CONTENT_FORMAT` | Azure Document Intelligence output content format. Optional if using `azure` provider. Defaults to `text`. 'markdown' is the other option and it requires the 'prebuilt-layout' model ID.    | No       | text                       |
-| `LISTEN_INTERFACE`                  | Network interface to listen on.                                                                                                                                                               | No       | 8080                       |
+| `LISTEN_INTERFACE`                  | Address or port to listen on (e.g. `:8080` or `8080`).                                                                                                                                        | No       | 8080                       |
Also applies to: 385-392, 547-547, 567-567
types.go (1)
128-129: Update comment to include 'whole_pdf' mode

The codebase and README document an additional mode ("whole_pdf"). The comment here should reflect that to avoid drift.

Apply:
-	ProcessMode     string // OCR processing mode: "image" (default) or "pdf"
+	ProcessMode     string // OCR processing mode: "image" (default), "pdf", or "whole_pdf"
jobs.go (2)
134-144: Bind jobCtx to an app/server root context

Using context.Background() breaks propagation of server shutdowns/timeouts into jobs. Prefer deriving jobCtx from a root context passed into the worker pool, so Stop/Shutdown can cascade properly.

Example change:

Pass a root ctx into startWorkerPool and processJob

Derive jobCtx from that root ctx
-func startWorkerPool(app *App, numWorkers int) {
+func startWorkerPool(ctx context.Context, app *App, numWorkers int) {
   for i := 0; i < numWorkers; i++ {
     go func(workerID int) {
       logger.Infof("Worker %d started", workerID)
-      for job := range jobQueue {
+      for {
+        select {
+        case <-ctx.Done():
+          return
+        case job := <-jobQueue:
           logger.Infof("Worker %d processing job: %s", workerID, job.ID)
-          processJob(app, job)
+          processJob(ctx, app, job)
         }
       }
     }(i)
   }
}
-func processJob(app *App, job *Job) {
+func processJob(parentCtx context.Context, app *App, job *Job) {
   jobStore.updateJobStatus(job.ID, "in_progress", "")
-  jobCtx, cancel := context.WithCancel(context.Background())
+  jobCtx, cancel := context.WithCancel(parentCtx)
   ...
}
145-150: Pre-run cleanup is helpful; consider conditional wipe for partial re-runs

Deleting all per-page results at job start is safe but can be wasteful for interrupted jobs or selective re-runs. Long term, consider keeping existing pages and only overwriting as they’re reprocessed.
local_db.go (1)

52-54: AutoMigrate is fine; indexes are created from struct tags

With the composite unique tags in OCRPageResult, AutoMigrate will create the unique index. No extra DDL needed.

If you want to be explicit and resilient to tag drift, you can assert the index exists post-migration and log a warning if not.
background_test.go (1)
43-49: Mock signature update: LGTM

This aligns the mock with the new interface. Good.

Consider updating the unused TestApp.ProcessDocumentOCR helper below to the new signature as well to prevent future confusion:
-func (app *TestApp) ProcessDocumentOCR(ctx context.Context, documentID int, options OCROptions) (*ProcessedDocument, error) {
+func (app *TestApp) ProcessDocumentOCR(ctx context.Context, documentID int, options OCROptions, jobID string) (*ProcessedDocument, error) {
     // Create a simple processed document with mock data
     mockText := app.ocrProvider.(*mockOCRProvider).text
     return &ProcessedDocument{
         ID:   documentID,
         Text: mockText,
     }, nil
 }
ocr/llm_provider.go (1)
137-155: Consider improving token counting accuracy for non-OpenAI models

The fallback token counting using llms.CountTokens may not be accurate for all models, particularly for Ollama and Mistral models which may use different tokenization schemes. While the code acknowledges this with a comment, it could lead to incorrect OcrLimitHit flags.

Consider adding model-specific token counting or documenting this limitation more prominently:
 		// Fallback: count tokens using langchaingo (might not be accurate for all models)
 		if tokenCount < 0 {
+			// Note: This may not be accurate for Ollama/Mistral models
+			log.Debugf("Using fallback token counting for model %s", p.model)
 			tokenCount = llms.CountTokens(p.model, text)
 		}
main.go (1)
238-273: Consider validating temperature and top-k ranges

While the code handles parsing errors for the Vision LLM configuration values, it doesn't validate that the values are within acceptable ranges. For example, temperature should typically be between 0.0 and 2.0, and top-k should be positive.

Add range validation for better user feedback:
 	if tempStr := os.Getenv("VISION_LLM_TEMPERATURE"); tempStr != "" {
 		if parsed, err := strconv.ParseFloat(tempStr, 64); err == nil {
+			if parsed < 0.0 || parsed > 2.0 {
+				log.Warnf("VISION_LLM_TEMPERATURE value %f is outside typical range [0.0, 2.0]", parsed)
+			}
 			visionLlmTemperature = &parsed
 		} else {
 			log.Warnf("Invalid VISION_LLM_TEMPERATURE value: %v, ignoring", err)
 		}
 	}

 	if topKStr := os.Getenv("OLLAMA_OCR_TOP_K"); topKStr != "" {
 		if parsed, err := strconv.Atoi(topKStr); err == nil {
+			if parsed <= 0 {
+				log.Warnf("OLLAMA_OCR_TOP_K must be positive, got %d, ignoring", parsed)
+			} else {
 				ollamaOcrTopK = &parsed
+			}
 		} else {
 			log.Warnf("Invalid OLLAMA_OCR_TOP_K value: %v, ignoring", err)
 		}
 	}
background.go (1)
213-217: Consider documenting the jobID parameter purpose

The empty string "" passed as the fourth parameter to ProcessDocumentOCR appears to be a jobID. Consider adding a comment to clarify why an empty jobID is used in the background processing context.

Add a clarifying comment:
 		if app.docProcessor != nil {
 			// Use injected processor if available
+			// Empty jobID since background processing doesn't track jobs
 			processedDoc, err = app.docProcessor.ProcessDocumentOCR(ctx, document.ID, options, "")
 		} else {
 			// Use the app's own implementation if no processor is injected
+			// Empty jobID since background processing doesn't track jobs
 			processedDoc, err = app.ProcessDocumentOCR(ctx, document.ID, options, "")
 		}
ocr.go (1)
310-321: Consider handling JSON marshaling errors more gracefully.

While the error from JSON marshaling is silently ignored (which is acceptable), consider logging it for debugging purposes, especially since this is generation metadata that might be important for troubleshooting.
 var genInfoJSON string
 if result.GenerationInfo != nil {
-    if b, err := json.Marshal(result.GenerationInfo); err == nil {
+    if b, err := json.Marshal(result.GenerationInfo); err != nil {
+        pageLogger.WithError(err).Debug("Failed to marshal GenerationInfo")
+    } else {
         genInfoJSON = string(b)
     }
 }
web-app/src/ExperimentalOCR.tsx (3)
35-43: Add error handling for network failures in stopOCRJob.

The function catches errors but only sets a generic error message. Consider providing more specific error feedback to help users understand what went wrong.
 const stopOCRJob = async () => {
   if (!jobId) return;
   try {
     await axios.post(`/api/ocr/jobs/${jobId}/stop`);
     setJobStatus('cancelled');
   } catch (err) {
-    setError('Failed to stop OCR job.');
+    const errorMessage = axios.isAxiosError(err) 
+      ? `Failed to stop OCR job: ${err.response?.data?.message || err.message}`
+      : 'Failed to stop OCR job.';
+    setError(errorMessage);
   }
 };
111-122: Improve error handling for JSON parsing.

The current implementation silently falls back to raw text if JSON parsing fails. Consider logging the parse error for debugging.
 let parsedResult: OCRCombinedResult | null = null;
 try {
   parsedResult = JSON.parse(response.data.result);
 } catch (e) {
+  console.warn('Failed to parse OCR result as JSON, using raw text:', e);
   setOcrResult(response.data.result);
   return;
 }
215-221: DELETE request error handling could be improved.

The cancellation DELETE request failure is only logged to console. Consider showing this error to the user.
 try {
   await axios.delete(`/api/documents/${documentId}/ocr_pages/${pageIdx}/reocr`);
   console.log(`Cancellation request sent for page ${pageIdx}`);
 } catch (err) {
   console.error(`Failed to send cancellation request for page ${pageIdx}:`, err);
+  setReOcrErrors((prev) => ({
+    ...prev,
+    [pageIdx]: "Failed to cancel Re-OCR request.",
+  }));
 }

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a9eeed1 and c14d4c4.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (13)

README.md (4 hunks)
app_http_handlers.go (4 hunks)
background.go (1 hunks)
background_test.go (1 hunks)
jobs.go (3 hunks)
local_db.go (4 hunks)
main.go (5 hunks)
ocr.go (7 hunks)
ocr/llm_provider.go (4 hunks)
ocr/provider.go (2 hunks)
types.go (2 hunks)
web-app/src/ExperimentalOCR.tsx (8 hunks)
web-app/src/ocrStatus.ts (1 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (8)

types.go (1)

ocr.go (1)

ProcessedDocument (20-26)

background_test.go (2)

types.go (1)

OCROptions (123-129)

ocr.go (1)

ProcessedDocument (20-26)

local_db.go (1)

app_http_handlers.go (1)

OCRPageResult (280-284)

jobs.go (1)

local_db.go (1)

DeleteOcrPageResults (153-155)

web-app/src/ExperimentalOCR.tsx (5)

local_db.go (1)

OCRPageResult (25-34)

app_http_handlers.go (1)

OCRPageResult (280-284)

web-app/src/ocrStatus.ts (4)

OCRJobStatus (1-1)

ClientStatus (2-2)

mapJobStatus (10-21)

getStatusViewOptions (23-44)

web-app/src/DocumentProcessor.tsx (1)

Document (9-15)

web-app/e2e/mistral-ocr-processing.spec.ts (1)

displayOCRComparison (17-88)

app_http_handlers.go (2)

main.go (1)

App (166-184)

local_db.go (3)

GetOcrPageResults (143-147)

OCRPageResult (25-34)

SaveSingleOcrPageResult (116-141)

ocr.go (4)

main.go (1)

App (166-184)

types.go (1)

OCROptions (123-129)

ocr/provider.go (1)

OCRResult (14-29)

local_db.go (1)

SaveSingleOcrPageResult (116-141)

ocr/llm_provider.go (1)

ocr/provider.go (1)

OCRResult (14-29)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: test

🔇 Additional comments (20)

README.md (1)

165-166: Docs addition for OLLAMA_CONTEXT_LENGTH looks good

Clear and actionable. This will help users tune memory vs. context trade-offs.

jobs.go (3)

163-173: Cancellation handling looks correct

Checking jobCtx.Err() to classify user-initiated cancellations is appropriate. Logging and status transitions are clear.

14-21: Re-OCR cancellers properly registered and cleaned up

I’ve confirmed that reOcrCancellers is both registered and cleaned up to prevent leaks:

In app_http_handlers.go:334–339, you lock, cancel any existing entry, and store the new cancelReOcr under cancelKey.

The deferred cleanup at app_http_handlers.go:341–345 removes the entry once processing completes.

In the cancellation endpoint (app_http_handlers.go:400–404), you lock, delete the entry, and then invoke the cancel func if it existed.

No further action needed.

22-33: TotalPages is already populated in the OCR pipeline – no change needed

The TotalPages field is assigned in the OCR routines before any status update:

ocr.go:198 and ocr.go:254 set job.TotalPages = totalPdfPages

The HTTP handler in app_http_handlers.go:195 returns it as "total_pages"

Since the field is correctly initialized and exposed to the UI, you can ignore the original suggestion.

Likely an incorrect or invalid review comment.

local_db.go (1)

143-147: LGTM: retrieval ordered by page_index

Ascending order matches expected page display in UI.

ocr/provider.go (1)

45-51: LGTM!

The new configuration fields for Vision LLM settings and Ollama OCR-specific settings are well-structured and properly documented.

ocr/llm_provider.go (3)

22-30: LGTM!

The LLMProvider struct fields are properly extended to include the new configuration options for max tokens, temperature, and Ollama-specific top-k setting.

113-123: LGTM!

The dynamic construction of call options based on configuration is well-implemented, properly checking for non-zero maxTokens and non-nil pointers before adding options.

189-196: LGTM!

The Ollama client creation properly handles the optional context length configuration.

app_http_handlers.go (2)

233-244: LGTM!

The stop OCR job handler is well-implemented with proper mutex protection and appropriate HTTP status codes.

334-345: LGTM!

The cancellation mechanism for per-page re-OCR is properly implemented with mutex protection and cleanup in defer.

main.go (1)

899-910: LGTM!

The Ollama LLM client creation properly shares the context length configuration logic with the Vision LLM path.

background.go (2)

224-227: LGTM!

Good defensive programming - properly handling the nil case for processedDoc.

236-241: LGTM!

The inline function for conditionally adding OCR complete tags is well-implemented and clear.

ocr.go (3)

44-54: LGTM! Job tracking is well-integrated.

The addition of the jobID parameter to the function signature and the conditional logging with the job ID is properly implemented.

267-276: Good implementation of cancellation handling.

The cancellation check with context.Done() is correctly placed before processing each page, and partial results are appropriately returned when cancelled.

195-201: All jobStore accesses are properly synchronized

jobStore is initialized at package level (var jobStore = &JobStore{jobs: make(map[string]*Job)}).

JobStore embeds a sync.RWMutex and wraps every map access:

addJob, updateJobStatus, updatePagesDone use Lock()/Unlock().

getJob, GetAllJobs use RLock()/RUnlock().

The direct map writes in ocr.go (lines 195–201) are already guarded by jobStore.Lock()/Unlock().

No further changes needed.

web-app/src/ocrStatus.ts (1)

1-44: LGTM! Well-structured status management.

The type definitions and status view logic are clean and well-organized. The getStatusViewOptions function provides a clear mapping between statuses and UI states.
web-app/src/ExperimentalOCR.tsx (2)

8-13: Excellent implementation of per-page OCR functionality!

The per-page OCR results display with re-OCR capabilities is well-structured. The UI properly handles loading states, errors, and provides clear visual feedback with token limit indicators and generation info tooltips.

Also applies to: 28-29, 157-387

105-108: Race condition in per-page results fetching.

The comparison newPagesDone > lastFetchedPagesDoneRef.current could lead to missed updates if multiple status checks occur rapidly. Consider using a more robust synchronization mechanism.
-if (newPagesDone > lastFetchedPagesDoneRef.current) {
+// Always fetch if pages have been processed and we're not at the same count
+if (newPagesDone > 0 && newPagesDone !== lastFetchedPagesDoneRef.current) {
   await fetchPerPageResults();
   lastFetchedPagesDoneRef.current = newPagesDone;
 }
Likely an incorrect or invalid review comment.

coderabbitai · 2025-08-13T16:09:35Z

app_http_handlers.go

+func (app *App) getOCRPagesHandler(c *gin.Context) {
+	id := c.Param("id")
+	parsedID, err := strconv.Atoi(id)
+	if err != nil {
+		c.JSON(http.StatusBadRequest, gin.H{"error": "Invalid document ID"})
+		return
+	}
+
+	dbResults, err := GetOcrPageResults(app.Database, parsedID)
+	if err != nil {
+		c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to fetch OCR page results"})
+		return
+	}
+
+	type OCRPageResult struct {
+		Text           string                 `json:"text"`
+		OcrLimitHit    bool                   `json:"ocrLimitHit"`
+		GenerationInfo map[string]interface{} `json:"generationInfo,omitempty"`
+	}
+
+	var pages []OCRPageResult
+	for _, res := range dbResults {
+		var genInfo map[string]interface{}
+		if res.GenerationInfo != "" {
+			_ = json.Unmarshal([]byte(res.GenerationInfo), &genInfo)
+		}
+		pages = append(pages, OCRPageResult{
+			Text:           res.Text,
+			OcrLimitHit:    res.OcrLimitHit,
+			GenerationInfo: genInfo,
+		})
+	}
+
+	c.JSON(http.StatusOK, gin.H{
+		"pages": pages,
+	})
+}


⚠️ Potential issue

Add error handling for JSON unmarshaling

The JSON unmarshaling of GenerationInfo at line 290 ignores errors silently. While this may be intentional for backward compatibility, it could hide data corruption issues.

Add error logging for failed JSON unmarshaling:

var genInfo map[string]interface{} if res.GenerationInfo != "" { - _ = json.Unmarshal([]byte(res.GenerationInfo), &genInfo) + if err := json.Unmarshal([]byte(res.GenerationInfo), &genInfo); err != nil { + log.Warnf("Failed to unmarshal GenerationInfo for doc %d page %d: %v", parsedID, i, err) + } }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In app_http_handlers.go around lines 266 to 302, the json.Unmarshal call for res.GenerationInfo ignores errors; update the loop to capture the unmarshaling error and log it (including enough context such as parsedID and res identifiers) instead of discarding it. Specifically, change the unmarshal to if err := json.Unmarshal([]byte(res.GenerationInfo), &genInfo); err != nil { /* log error with context using app logger or the standard logger */ } so the handler records bad/malformed GenerationInfo while continuing to return other page data.

coderabbitai · 2025-08-13T16:09:35Z

app_http_handlers.go

+	imagePaths, _, err := app.Client.DownloadDocumentAsImages(c.Request.Context(), parsedID, limitOcrPages)
+	if err != nil || pageIdx < 0 || pageIdx >= len(imagePaths) {
+		c.JSON(http.StatusBadRequest, gin.H{"error": "Invalid page index or failed to download images"})
+		return
+	}


🛠️ Refactor suggestion

Verify image index bounds before file access

The code downloads images and then checks if pageIdx is within bounds, but it reads the file before performing adequate validation. While the bounds check exists, the error message combines two different error conditions.

Separate the error conditions for clarity:

imagePaths, _, err := app.Client.DownloadDocumentAsImages(c.Request.Context(), parsedID, limitOcrPages) - if err != nil || pageIdx < 0 || pageIdx >= len(imagePaths) { - c.JSON(http.StatusBadRequest, gin.H{"error": "Invalid page index or failed to download images"}) + if err != nil { + c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to download images"}) + return + } + if pageIdx < 0 || pageIdx >= len(imagePaths) { + c.JSON(http.StatusBadRequest, gin.H{"error": fmt.Sprintf("Invalid page index %d, document has %d pages", pageIdx, len(imagePaths))}) return }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

imagePaths, _, err := app.Client.DownloadDocumentAsImages(c.Request.Context(), parsedID, limitOcrPages)

if err != nil || pageIdx < 0 || pageIdx >= len(imagePaths) {

c.JSON(http.StatusBadRequest, gin.H{"error": "Invalid page index or failed to download images"})

return

}

imagePaths, _, err := app.Client.DownloadDocumentAsImages(c.Request.Context(), parsedID, limitOcrPages)

if err != nil {

c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to download images"})

return

}

if pageIdx < 0 || pageIdx >= len(imagePaths) {

c.JSON(http.StatusBadRequest, gin.H{"error": fmt.Sprintf("Invalid page index %d, document has %d pages", pageIdx, len(imagePaths))})

return

}

🤖 Prompt for AI Agents

In app_http_handlers.go around lines 319 to 323, the code calls DownloadDocumentAsImages and then treats a combined error condition (download error or invalid page index) as a single BadRequest; instead, first check and handle the download error (return 500 or appropriate server error with the download error message) before inspecting imagePaths, then separately validate pageIdx against len(imagePaths) and return a clear 400 BadRequest if the index is out of bounds; ensure you do not reference imagePaths length when err != nil and return distinct, descriptive error responses for the two failure modes.

coderabbitai · 2025-08-13T16:09:36Z

app_http_handlers.go

+	var genInfoJSON string
+	if result.GenerationInfo != nil {
+		if b, err := json.Marshal(result.GenerationInfo); err == nil {
+			genInfoJSON = string(b)
+		}
+	}


⚠️ Potential issue

Add error handling for JSON marshaling

The JSON marshaling at line 367 could fail, but errors are silently ignored, which could result in empty genInfoJSON being saved.

Handle JSON marshaling errors:

var genInfoJSON string if result.GenerationInfo != nil { - if b, err := json.Marshal(result.GenerationInfo); err == nil { + if b, err := json.Marshal(result.GenerationInfo); err != nil { + log.Errorf("Failed to marshal GenerationInfo for doc %d page %d: %v", parsedID, pageIdx, err) + } else { genInfoJSON = string(b) } }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

var genInfoJSON string

if result.GenerationInfo != nil {

if b, err := json.Marshal(result.GenerationInfo); err == nil {

genInfoJSON = string(b)

}

}

var genInfoJSON string

if result.GenerationInfo != nil {

if b, err := json.Marshal(result.GenerationInfo); err != nil {

log.Errorf("Failed to marshal GenerationInfo for doc %d page %d: %v", parsedID, pageIdx, err)

} else {

genInfoJSON = string(b)

}

}

🤖 Prompt for AI Agents

In app_http_handlers.go around lines 365 to 370, the json.Marshal call for result.GenerationInfo currently ignores errors which can lead to empty genInfoJSON; change the code to check the error from json.Marshal, and on error log the marshaling error with context (including any identifiers like request ID or generation ID) and return or propagate an appropriate error response (e.g., HTTP 500 / internal error) instead of proceeding with an empty string; if returning is not appropriate in this handler, at minimum set genInfoJSON to a safe default and log the error so the failure is visible.

coderabbitai · 2025-08-13T16:09:36Z

local_db.go

+type OCRPageResult struct {
+	ID             uint   `gorm:"primaryKey"`
+	DocumentID     int    `gorm:"index;not null"`
+	PageIndex      int    `gorm:"not null"`
+	Text           string `gorm:"size:1048576"`
+	OcrLimitHit    bool
+	GenerationInfo string `gorm:"type:TEXT"`
+	CreatedAt      time.Time
+	UpdatedAt      time.Time
+}
+


🛠️ Refactor suggestion

Add a composite unique index on (document_id, page_index)

Without a uniqueness constraint, concurrent upserts can create duplicate rows. Add a composite unique index so updates are deterministic and performant.

type OCRPageResult struct { - ID uint `gorm:"primaryKey"` - DocumentID int `gorm:"index;not null"` - PageIndex int `gorm:"not null"` + ID uint `gorm:"primaryKey"` + DocumentID int `gorm:"not null;uniqueIndex:idx_doc_page"` + PageIndex int `gorm:"not null;uniqueIndex:idx_doc_page"` Text string `gorm:"size:1048576"` OcrLimitHit bool GenerationInfo string `gorm:"type:TEXT"` CreatedAt time.Time UpdatedAt time.Time }

🤖 Prompt for AI Agents

In local_db.go around lines 25 to 35, the OCRPageResult struct lacks a composite unique index on (DocumentID, PageIndex); add a composite unique index by updating the GORM tags for DocumentID and PageIndex to share the same index name and mark it unique (e.g. index:idx_document_page,unique) so the pair is enforced as unique, then run or include a DB migration/AutoMigrate to create the new index to prevent duplicate rows during concurrent upserts.

coderabbitai · 2025-08-13T16:09:36Z

local_db.go

+// SaveSingleOcrPageResult saves or updates the OCR result for a single page, including GenerationInfo as JSON
+func SaveSingleOcrPageResult(db *gorm.DB, docID int, pageIdx int, text string, ocrLimitHit bool, generationInfoJSON string) error {
+	var result OCRPageResult
+	tx := db.Where("document_id = ? AND page_index = ?", docID, pageIdx).First(&result)
+	if tx.Error == nil {
+		result.Text = text
+		result.OcrLimitHit = ocrLimitHit
+		result.GenerationInfo = generationInfoJSON
+		return db.Save(&result).Error
+	} else if tx.Error != nil {
+		log.Debugf("SaveSingleOcrPageResult: db.First error: %v (is gorm.ErrRecordNotFound: %v)", tx.Error, errors.Is(tx.Error, gorm.ErrRecordNotFound))
+		if errors.Is(tx.Error, gorm.ErrRecordNotFound) {
+			result = OCRPageResult{
+				DocumentID:     docID,
+				PageIndex:      pageIdx,
+				Text:           text,
+				OcrLimitHit:    ocrLimitHit,
+				GenerationInfo: generationInfoJSON,
+			}
+			return db.Create(&result).Error
+		} else {
+			log.Errorf("Unexpected DB error in SaveSingleOcrPageResult: %v", tx.Error)
+			return tx.Error
+		}
+	}
+	return nil
+}


🛠️ Refactor suggestion

Replace find-then-save with an atomic upsert

Current flow can race under concurrency and costs an extra query. Prefer a single INSERT ... ON CONFLICT DO UPDATE keyed by (document_id, page_index).

-func SaveSingleOcrPageResult(db *gorm.DB, docID int, pageIdx int, text string, ocrLimitHit bool, generationInfoJSON string) error { - var result OCRPageResult - tx := db.Where("document_id = ? AND page_index = ?", docID, pageIdx).First(&result) - if tx.Error == nil { - result.Text = text - result.OcrLimitHit = ocrLimitHit - result.GenerationInfo = generationInfoJSON - return db.Save(&result).Error - } else if tx.Error != nil { - log.Debugf("SaveSingleOcrPageResult: db.First error: %v (is gorm.ErrRecordNotFound: %v)", tx.Error, errors.Is(tx.Error, gorm.ErrRecordNotFound)) - if errors.Is(tx.Error, gorm.ErrRecordNotFound) { - result = OCRPageResult{ - DocumentID: docID, - PageIndex: pageIdx, - Text: text, - OcrLimitHit: ocrLimitHit, - GenerationInfo: generationInfoJSON, - } - return db.Create(&result).Error - } else { - log.Errorf("Unexpected DB error in SaveSingleOcrPageResult: %v", tx.Error) - return tx.Error - } - } - return nil -} +func SaveSingleOcrPageResult(db *gorm.DB, docID int, pageIdx int, text string, ocrLimitHit bool, generationInfoJSON string) error { + rec := OCRPageResult{ + DocumentID: docID, + PageIndex: pageIdx, + Text: text, + OcrLimitHit: ocrLimitHit, + GenerationInfo: generationInfoJSON, + } + return db. + Clauses(clause.OnConflict{ + Columns: []clause.Column{{Name: "document_id"}, {Name: "page_index"}}, + DoUpdates: clause.AssignmentColumns([]string{"text", "ocr_limit_hit", "generation_info", "updated_at"}), + }). + Create(&rec).Error +}

Add the import needed for clause:

import ( "errors" ... "gorm.io/gorm/clause" )

🤖 Prompt for AI Agents

In local_db.go around lines 115 to 141, replace the find-then-save pattern with an atomic upsert to avoid races and the extra query: add the "gorm.io/gorm/clause" import, construct the OCRPageResult struct with the provided fields, and use db.Clauses(clause.OnConflict{Columns: []clause.Column{{Name:"document_id"},{Name:"page_index"}}, UpdateAll: true}).Create(&result) so the row is inserted or updated in a single statement keyed on (document_id, page_index); remove the prior db.Where(...).First(...) logic and related branching, returning the Create error if any.

coderabbitai · 2025-08-13T16:09:37Z

types.go

+	AddTags                []string `json:"add_tags,omitempty"`
 }


💡 Verification agent

🧩 Analysis chain

AddTags field: LGTM; verify downstream handling

Adding add_tags aligns with the UI/UX of incremental tagging. Please verify that UpdateDocuments (client and handlers) actually interprets add_tags distinct from suggested_tags and remove_tags to avoid double-application or conflicts.

Run to find usages of add_tags in server handlers and client code:

🏁 Script executed:

#!/bin/bash rg -n --json 'add_tags' -A 3 -B 3

Length of output: 1890

Implement add_tags handling in UpdateDocuments

The new add_tags field is defined in types.go but isn’t referenced anywhere else (no matches beyond its declaration). As a result, calls to UpdateDocuments won’t apply added tags. Please:

In the server handler (e.g. the UpdateDocuments method in server/handlers/...), read add_tags from the incoming payload and merge those tags into the document, guarding against conflicts with suggested_tags and remove_tags.

In the client (UpdateDocuments method in client/...), include the add_tags array when constructing the request.

Add or update unit/integration tests to verify incremental tag additions work as expected.

🤖 Prompt for AI Agents

In types.go around lines 101-102 the AddTags []string `json:"add_tags,omitempty"` field is declared but never used; update the server handler (server/handlers/... UpdateDocuments) to read payload.add_tags, merge those tags into the target document’s tag set (deduplicate), ensure tags present in remove_tags are not re-added and that add_tags do not override/duplicate suggested_tags handling (apply a clear precedence: remove_tags wins, then add_tags, then suggested_tags), and persist the merged tag set; update the client (client/... UpdateDocuments) to include the add_tags array when building the request payload; and add/update unit/integration tests that call UpdateDocuments with combinations of add_tags, remove_tags, and suggested_tags to assert tags are correctly added, removed, and deduplicated.

coderabbitai · 2025-08-13T16:09:37Z

types.go

+	ProcessDocumentOCR(ctx context.Context, documentID int, options OCROptions, jobID string) (*ProcessedDocument, error)
 }


💡 Verification agent

🧩 Analysis chain

Signature change to ProcessDocumentOCR: ensure all implementors/callers updated

This is a public interface change. Please confirm there are no remaining implementations or call sites using the old 3-arg signature.

Use this script to surface function declarations and invocations that may still use the old form:

🏁 Script executed:

#!/bin/bash # Find all function declarations named ProcessDocumentOCR rg -n 'func\s+$.*$\s+ProcessDocumentOCR\(' -A 2 # Show call sites and arity hints (look for 3 vs 4 args) rg -n 'ProcessDocumentOCR\(' -A 1 -B 1

Length of output: 2551

Fix remaining TestApp mock to match new signature

The TestApp.ProcessDocumentOCR override in background_test.go is still using the old 3-arg form. Update it to include the new jobID string parameter:

• background_test.go:329

- func (app *TestApp) ProcessDocumentOCR(ctx context.Context, documentID int, options OCROptions) (*ProcessedDocument, error) { + func (app *TestApp) ProcessDocumentOCR(ctx context.Context, documentID int, options OCROptions, jobID string) (*ProcessedDocument, error) {

Adjust any test logic inside the method to accept or ignore the jobID as needed.

🤖 Prompt for AI Agents

In types.go around lines 148-149 the ProcessDocumentOCR signature was changed to include a jobID string; update the TestApp.ProcessDocumentOCR mock in background_test.go (around line 329) to match that new signature — add the final jobID string parameter to the function declaration, and update the mock body to accept or ignore jobID as appropriate (and adjust any references or expectations in the test to use the new parameter); also ensure any test calls to TestApp.ProcessDocumentOCR pass a jobID or use a placeholder.

coderabbitai · 2025-08-13T16:09:37Z

web-app/src/ExperimentalOCR.tsx

+      const newJobStatus = mapJobStatus(response.data.status);
+      setJobStatus(newJobStatus);


⚠️ Potential issue

Handle potential mapJobStatus errors gracefully.

Since mapJobStatus can throw an error for unknown statuses (as noted in the ocrStatus.ts review), this could crash the component. Add error handling here.

try { const response = await axios.get(`./api/jobs/ocr/${jobId}`); - const newJobStatus = mapJobStatus(response.data.status); - setJobStatus(newJobStatus); + try { + const newJobStatus = mapJobStatus(response.data.status); + setJobStatus(newJobStatus); + } catch (mapError) { + console.error('Failed to map job status:', mapError); + setJobStatus('idle'); + setError('Received unknown job status from server.'); + return; + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const newJobStatus = mapJobStatus(response.data.status);

setJobStatus(newJobStatus);

const response = await axios.get(`./api/jobs/ocr/${jobId}`);

try {

const newJobStatus = mapJobStatus(response.data.status);

setJobStatus(newJobStatus);

} catch (mapError) {

console.error('Failed to map job status:', mapError);

setJobStatus('idle');

setError('Received unknown job status from server.');

return;

}

🤖 Prompt for AI Agents

In web-app/src/ExperimentalOCR.tsx around lines 99 to 100, the call to mapJobStatus(response.data.status) can throw for unknown statuses and currently can crash the component; wrap the mapping in a try/catch, fall back to a safe status (e.g., 'UNKNOWN' or a defined error state) and call setJobStatus with that fallback, and log/report the caught error (console.error or an error-tracking helper) so the component continues rendering without crashing.

coderabbitai · 2025-08-13T16:09:37Z

web-app/src/ExperimentalOCR.tsx

+      if (pageIdx + 1 > lastFetchedPagesDoneRef.current) {
+        lastFetchedPagesDoneRef.current = pageIdx + 1;
+      }


⚠️ Potential issue

Logic issue with page progress tracking.

The condition pageIdx + 1 > lastFetchedPagesDoneRef.current seems incorrect. Re-OCRing a single page shouldn't necessarily update the overall progress counter which tracks bulk OCR job progress.

This logic appears to mix single-page re-OCR progress with bulk job progress tracking. Consider removing these lines or using a separate tracking mechanism for re-OCR operations:

-if (pageIdx + 1 > lastFetchedPagesDoneRef.current) { - lastFetchedPagesDoneRef.current = pageIdx + 1; -}

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if (pageIdx + 1 > lastFetchedPagesDoneRef.current) {

lastFetchedPagesDoneRef.current = pageIdx + 1;

}

🤖 Prompt for AI Agents

In web-app/src/ExperimentalOCR.tsx around lines 188-190, the code updates lastFetchedPagesDoneRef.current during a single-page re-OCR (if (pageIdx + 1 > lastFetchedPagesDoneRef.current) { lastFetchedPagesDoneRef.current = pageIdx + 1; }), which incorrectly mixes single-page re-OCR progress with bulk OCR job tracking; remove these two lines or change the logic so lastFetchedPagesDoneRef is only advanced during bulk/batch OCR flows (e.g., guard the update with an explicit isBulkOrInitialRun flag or use a separate ref/state for re-OCR progress) so re-OCR of a single page does not alter the overall bulk progress counter.

coderabbitai · 2025-08-13T16:09:37Z

web-app/src/ocrStatus.ts

+export const mapJobStatus = (raw: string | null | undefined): OCRJobStatus => {
+  switch (raw) {
+    case 'pending':
+    case 'in_progress':
+    case 'completed':
+    case 'failed':
+    case 'cancelled':
+      return raw;
+    default:
+      throw new Error(`Unknown job status: ${raw}`);
+  }
+};


🛠️ Refactor suggestion

Consider returning 'idle' instead of throwing for unknown statuses.

The mapJobStatus function throws an error for unknown/null/undefined status values. This could cause the UI to crash if the backend returns an unexpected status. Consider returning 'idle' as a safe default instead.

export const mapJobStatus = (raw: string | null | undefined): OCRJobStatus => { switch (raw) { case 'pending': case 'in_progress': case 'completed': case 'failed': case 'cancelled': return raw; default: - throw new Error(`Unknown job status: ${raw}`); + console.warn(`Unknown job status: ${raw}, defaulting to 'idle'`); + return 'idle'; } };

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

export const mapJobStatus = (raw: string | null | undefined): OCRJobStatus => {

switch (raw) {

case 'pending':

case 'in_progress':

case 'completed':

case 'failed':

case 'cancelled':

return raw;

default:

throw new Error(`Unknown job status: ${raw}`);

}

};

export const mapJobStatus = (raw: string | null | undefined): OCRJobStatus => {

switch (raw) {

case 'pending':

case 'in_progress':

case 'completed':

case 'failed':

case 'cancelled':

return raw;

default:

console.warn(`Unknown job status: ${raw}, defaulting to 'idle'`);

return 'idle';

}

};

🤖 Prompt for AI Agents

In web-app/src/ocrStatus.ts around lines 10 to 21, the function currently throws on unknown/null/undefined statuses which can crash the UI; change the default behavior so that instead of throwing an error you return the safe default value 'idle' (and explicitly handle null/undefined by returning 'idle' as well) so unexpected backend values won't bubble up as exceptions.

icereed · 2025-08-17T09:25:27Z

Hey 👋
Thank you for this awesome feature! I will try to review this in the next days. It might take a bit longer since I’m quite busy with several “projects”. But still I’m very grateful for this contribution. 🙏

adfaklsdjf · 2025-10-13T03:09:21Z

I've built an image from this and am using it on my Unraid Paperless install. I'm really only using the VISION_LLM_TEMPERATURE configuration option so I can use gpt-5-mini for OCR.. I'm not really exercising any of the other features in the PR.

The image is at adfaklsdjf/paperless-gpt:pr-573 on dockerhub. gpt-5mini is working for me. 👍

Thank you @dawidkulpa for your work.

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (2)

app_http_handlers.go (2)
355-358: Log GenerationInfo unmarshal failures

Swallowing the error hides corrupt entries and makes debugging impossible. Please log the failure with context so we know when a page payload is bad.
-		if res.GenerationInfo != "" {
-			_ = json.Unmarshal([]byte(res.GenerationInfo), &genInfo)
+		if res.GenerationInfo != "" {
+			if err := json.Unmarshal([]byte(res.GenerationInfo), &genInfo); err != nil {
+				log.Warnf("Failed to unmarshal GenerationInfo for doc %d page %d: %v", parsedID, res.PageIndex, err)
+			}
 		}
433-436: Handle GenerationInfo marshal errors

We still drop marshal failures silently, which loses diagnostics and stores empty JSON. Log the error and only persist the payload when marshaling succeeds.
-	if result.GenerationInfo != nil {
-		if b, err := json.Marshal(result.GenerationInfo); err == nil {
-			genInfoJSON = string(b)
-		}
-	}
+	if result.GenerationInfo != nil {
+		if b, err := json.Marshal(result.GenerationInfo); err != nil {
+			log.Errorf("Failed to marshal GenerationInfo for doc %d page %d: %v", parsedID, pageIdx, err)
+		} else {
+			genInfoJSON = string(b)
+		}
+	}

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c14d4c4 and f5d8302.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (5)

README.md (4 hunks)
app_http_handlers.go (4 hunks)
background.go (1 hunks)
main.go (5 hunks)
types.go (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

README.md

🧰 Additional context used

📓 Path-based instructions (1)

**/*.go

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Ensure all Go source files are formatted with gofmt before committing

Files:

types.go
app_http_handlers.go
background.go
main.go

🧠 Learnings (1)

📚 Learning: 2025-09-28T18:55:46.584Z

Learnt from: CR
PR: icereed/paperless-gpt#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-09-28T18:55:46.584Z
Learning: Applies to ocr/**/*.go : Implement OCR providers under the ocr/ package

Applied to files:

app_http_handlers.go
background.go
main.go

🧬 Code graph analysis (3)

types.go (1)

ocr.go (1)

ProcessedDocument (20-26)

app_http_handlers.go (2)

main.go (1)

App (110-128)

local_db.go (3)

GetOcrPageResults (143-147)

OCRPageResult (25-34)

SaveSingleOcrPageResult (116-141)

background.go (2)

types.go (1)

DocumentSuggestion (114-128)

web-app/src/DocumentProcessor.tsx (1)

DocumentSuggestion (36-45)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: test

app_http_handlers.go

coderabbitai · 2025-10-13T03:40:03Z

app_http_handlers.go

+	saveErr := SaveSingleOcrPageResult(app.Database, parsedID, pageIdx, result.Text, result.OcrLimitHit, genInfoJSON)
+	if saveErr != nil {
+		log.Errorf("Failed to save re-OCR result for doc %d page %d: %v", parsedID, pageIdx, saveErr)
+	}
+
+	c.JSON(http.StatusOK, gin.H{
+		"text":           result.Text,
+		"ocrLimitHit":    result.OcrLimitHit,
+		"generationInfo": result.GenerationInfo,
+	})


⚠️ Potential issue | 🟠 Major

Surface DB write failures to the client

If persisting the page result fails we currently return 200, so the UI shows success even though nothing was saved. Please fail fast and tell the client so they can retry.

saveErr := SaveSingleOcrPageResult(app.Database, parsedID, pageIdx, result.Text, result.OcrLimitHit, genInfoJSON) if saveErr != nil { log.Errorf("Failed to save re-OCR result for doc %d page %d: %v", parsedID, pageIdx, saveErr) - } - - c.JSON(http.StatusOK, gin.H{ + c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to persist re-OCR result"}) + return + } + + c.JSON(http.StatusOK, gin.H{

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

saveErr := SaveSingleOcrPageResult(app.Database, parsedID, pageIdx, result.Text, result.OcrLimitHit, genInfoJSON)

if saveErr != nil {

log.Errorf("Failed to save re-OCR result for doc %d page %d: %v", parsedID, pageIdx, saveErr)

}

c.JSON(http.StatusOK, gin.H{

"text": result.Text,

"ocrLimitHit": result.OcrLimitHit,

"generationInfo": result.GenerationInfo,

})

saveErr := SaveSingleOcrPageResult(app.Database, parsedID, pageIdx, result.Text, result.OcrLimitHit, genInfoJSON)

if saveErr != nil {

log.Errorf("Failed to save re-OCR result for doc %d page %d: %v", parsedID, pageIdx, saveErr)

c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to persist re-OCR result"})

return

}

c.JSON(http.StatusOK, gin.H{

"text": result.Text,

"ocrLimitHit": result.OcrLimitHit,

"generationInfo": result.GenerationInfo,

})

🤖 Prompt for AI Agents

In app_http_handlers.go around lines 438-447 the handler logs SaveSingleOcrPageResult failures but still returns HTTP 200; change the control flow to return an error response when saveErr != nil by responding with a non-200 status (e.g., http.StatusInternalServerError or http.StatusBadGateway) and a concise JSON error body (message field), then immediately return so the client sees the failure and can retry; do not remove the existing log, but ensure the error response does not leak sensitive internals.

icereed · 2025-10-13T03:48:15Z

Thank you all for your patience!
This week be part of the next release. 🎉🥳

coderabbitai bot reviewed Aug 13, 2025

View reviewed changes

adfaklsdjf mentioned this pull request Oct 13, 2025

400: Unsupported value: 'temperature' does not support 0 with this model #448

Closed

Merge branch 'main' into reocr-ui

f5d8302

coderabbitai bot reviewed Oct 13, 2025

View reviewed changes

icereed merged commit fbc0ae8 into icereed:main Oct 13, 2025
5 of 6 checks passed

		ProcessDocumentOCR(ctx context.Context, documentID int, options OCROptions, jobID string) (*ProcessedDocument, error)
		}

		const newJobStatus = mapJobStatus(response.data.status);
		setJobStatus(newJobStatus);

-      const newJobStatus = mapJobStatus(response.data.status);
-      setJobStatus(newJobStatus);
+      const response = await axios.get(`./api/jobs/ocr/${jobId}`);
+      try {
+        const newJobStatus = mapJobStatus(response.data.status);
+        setJobStatus(newJobStatus);
+      } catch (mapError) {
+        console.error('Failed to map job status:', mapError);
+        setJobStatus('idle');
+        setError('Received unknown job status from server.');
+        return;
+      }

	if (pageIdx + 1 > lastFetchedPagesDoneRef.current) {
	lastFetchedPagesDoneRef.current = pageIdx + 1;
	}

Uh oh!

Improves OCR UI and adds more options for LLMs #573

Improves OCR UI and adds more options for LLMs #573

Uh oh!

Conversation

dawidkulpa commented Aug 13, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

icereed commented Aug 17, 2025

Uh oh!

adfaklsdjf commented Oct 13, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

icereed commented Oct 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dawidkulpa commented Aug 13, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 13, 2025 •

edited

Loading