Skip to content

Conversation

dawidkulpa
Copy link
Contributor

@dawidkulpa dawidkulpa commented Aug 13, 2025

Hey @icereed !
I'm happy to see this project still getting new features and overall being maintained! :)
I've been adding small improvements to the project for my own use, and after some time using it, I decided to open a PR with them as I've noticed people opening feature requests for some of them :)
Hopefully you can find these changes going inline with your idea for the project.

The changes I'm introducing:

  1. Re-OCRing specific pages from UI (partially answers [Feature Request] Retry OCR for specific pages #223 , but requires the OCR to be requested entirely via UI)
    If the OCR is requested via UI, you can now see live progress of it, page by page.
    You can also at any time press the "Re-OCR Page" button to request re-ocr of that page.
    When the job finishes, the entire content is shown in the last box, ready to be saved to paperless.
image
  1. Token stats for each of the pages(if available)
image
  1. Max Tokens option for LLMs
    You can now set VISION_LLM_MAX_TOKENS, and it'll be sent to the LLM provider, limiting the output's length. It's also helpful if your LLM tends to get into repeating itself cycles, as if the MAX_TOKENS limit is hit, you'll get a message on the UI(so you don't have to check every single page for that, for larger jobs), and can just click "Re-OCR" straight away.
image
  1. Context Length and Top K options for Ollama deployments
    You can now set OLLAMA_CONTEXT_LENGTH and OLLAMA_OCR_TOP_K to tweak your local models. Especially the context length option is useful, as you can control how much of memory is reserved for your runs.

  2. Temperature
    The last of the new options is temperature. This is useful in many cases(like making your model less creative), but the most important use case for it are GPT-5 family models deployments. As currently if you try to use GPT-5, you always get an error that the only supported temperature is 1.0. This addresses 400: Unsupported value: 'temperature' does not support 0 with this model #448

Let me know what you think :)

Summary by CodeRabbit

  • New Features

    • Cancel running OCR jobs; cancel per-page re-OCR.
    • View and persist per-page OCR results with token-limit badges and generation metadata.
    • Re-run OCR for individual pages and see live pages-processed/total progress.
  • Improvements

    • UI: per-page controls, tooltips for generation info, enhanced status views and save flow.
    • API: new endpoints for job control and per-page OCR; job status now reports total pages.
    • Config: env vars for Ollama context and Vision LLM tuning respected with graceful fallbacks.
  • Documentation

    • Expanded README with Ollama context management, Vision LLM tuning, examples, and troubleshooting.

- introduces re-OCRing specific pages from UI (partially answers icereed#223)
- adds more options for LLM tweaking
- allows setting temperature to workaround GPT-5 issue (fixes icereed#448)

adds tweaks for OCR: configuring ollama model, showing per page result, allowing re-ocring single page

fixes re-ocring

additional fixes

some more fixes

some more fixes

some more fixes

some more fixes

some more fixes

some more fixes

some more fixes
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 13, 2025

Walkthrough

Adds cancellable OCR jobs, per-page OCR result persistence and re-OCR with per-page cancellation, extends ProcessDocumentOCR to accept jobID and track total pages, introduces OCRPageResult DB model, new HTTP endpoints for job/page control, and new env-driven LLM/Ollama configuration options. UI updated for per-page status and controls.

Changes

Cohort / File(s) Summary
Documentation
README.md
Documents Ollama context handling and Vision LLM tuning; adds env vars: OLLAMA_CONTEXT_LENGTH, VISION_LLM_MAX_TOKENS, VISION_LLM_TEMPERATURE, OLLAMA_OCR_TOP_K; updates examples and troubleshooting.
HTTP API: OCR control & per-page OCR
app_http_handlers.go, main.go
Adds endpoints: GET /documents/:id/ocr_pages, POST /documents/:id/ocr_pages/:pageIndex/reocr, DELETE /documents/:id/ocr_pages/:pageIndex/reocr, POST /ocr/jobs/:job_id/stop; handlers implement job and per-page cancellation, fetching per-page results, re-OCR flow and cancellation responses (including 204/404/499 handling).
Background processing & interfaces
background.go, types.go, background_test.go
Extends ProcessDocumentOCR signature to accept jobID string; adds DocumentSuggestion.AddTags; adds nil-safety guard for processedDoc; updates tests/mocks to new signature.
Job lifecycle & cancellation
jobs.go
Introduces per-job cancellers with mutex maps, creates cancellable job contexts, deletes old per-page results before OCR runs, propagates jobID to OCR, tracks TotalPages on Job, and handles cancelled/failed/completed states.
DB schema: per-page OCR
local_db.go
Adds OCRPageResult model and migrations; implements upsert/save single page result, retrieval, update alias, and delete-all per-document; stores GenerationInfo JSON and OcrLimitHit.
OCR core & provider changes
ocr.go, ocr/provider.go, ocr/llm_provider.go
ProcessDocumentOCR(ctx, documentID, options, jobID) tracks total pages, supports ctx cancellation, accumulates per-page OCRResults, marshals and saves GenerationInfo, persists per-page results, and returns partial results on cancel. Provider/LLM support added fields: VisionLLMMaxTokens, VisionLLMTemperature, OllamaOcrTopK, OllamaContextLength; OCRResult now includes OcrLimitHit and GenerationInfo.
Config and app bootstrap
main.go
Parses new env vars (with warnings on invalid values) into OCR config fields and applies Ollama context-length option when set; wires new OCR endpoints into routes.
Web UI: Experimental OCR
web-app/src/ExperimentalOCR.tsx, web-app/src/ocrStatus.ts
UI now tracks job/client statuses, fetches and displays per-page results, supports stop job and per-page re-OCR with cancel, shows token-limit badges and generation info tooltips; adds OCRJobStatus, ClientStatus, mapJobStatus, and getStatusViewOptions.

Sequence Diagram(s)

sequenceDiagram
  participant UI
  participant API as HTTP Server
  participant Jobs as Job Worker
  participant OCR as OCR Processor
  participant DB

  UI->>API: POST /api/ocr/jobs (documentID)
  API-->>UI: 202 Accepted {job_id}
  Jobs->>Jobs: create cancellable context (jobCtx), store cancel
  Jobs->>OCR: ProcessDocumentOCR(jobCtx, docID, opts, jobID)
  OCR->>DB: Delete old per-page results (pre)
  loop per page
    OCR->>OCR: check ctx.Done()
    OCR->>OCR: call LLM/provider -> OCRResult{text, limitHit, genInfo}
    OCR->>DB: SaveSingleOcrPageResult(docID, pageIdx, text, ocrLimitHit, genInfo)
    OCR->>Jobs: update pages done/total
  end
  OCR-->>Jobs: final or partial text
  Jobs-->>API: update job status (completed/failed/cancelled)
  UI->>API: GET /api/jobs/:id/status
  API-->>UI: {status, pages_done, total_pages}
  UI->>API: GET /documents/:id/ocr_pages
  API-->>UI: per-page results
Loading
sequenceDiagram
  participant UI
  participant API as HTTP Server
  participant PageWorker as Per-Page Handler
  participant OCR as OCR Provider
  participant DB

  UI->>API: POST /documents/:id/ocr_pages/:pageIndex/reocr
  API->>PageWorker: create cancellable ctx for page, store cancel
  PageWorker->>OCR: ProcessImage(page)
  OCR-->>PageWorker: OCRResult{text, ocrLimitHit, genInfo}
  PageWorker->>DB: SaveSingleOcrPageResult(docID, pageIdx, ...)
  PageWorker-->>UI: 200 {text, ocrLimitHit, generationInfo}
  UI->>API: DELETE /documents/:id/ocr_pages/:pageIndex/reocr
  API-->>PageWorker: invoke cancel -> return 204 or 404
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

A rabbit taps the job to go, hop-hop then pause,
Pages saved like crunchy leaves, one-by-one because.
Tokens nibble at the edge, Top-K carrot bright,
Ollama threads a cozy burrow, context snug at night.
Per-page hops and gentle stops — OCR all done, delight! 🐇

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title concisely summarizes the primary contributions of this pull request, namely enhancements to the OCR user interface and the addition of new environment variables to configure LLM behavior. It clearly reflects both the UI changes (re-OCR controls, per-page progress and result displays) and the expanded LLM options (max tokens, context length, temperature, top-k). This phrasing gives teammate readers a clear, high-level understanding without unnecessary detail.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

🧹 Nitpick comments (15)
README.md (3)

539-543: Clarify provider semantics for new Vision LLM knobs

Good additions. Two small clarifications will reduce user confusion:

  • VISION_LLM_MAX_TOKENS: Different providers/models interpret “tokens” differently (output cap vs. total). Call this out explicitly for OpenAI vs. Ollama.
  • OLLAMA_OCR_TOP_K: Note it only applies when the selected Ollama model supports top-k.

If you agree, consider amending the descriptions to mention provider-specific behavior.


875-879: Nice troubleshooting guidance; add one guardrail

These tips are practical. Add a short note that increasing OLLAMA_CONTEXT_LENGTH increases KV cache memory usage on the Ollama server, which can cause OOM/restarts for large values on small hosts.

Also applies to: 886-888, 893-894


161-166: Fix inconsistencies and typos (Option numbering, provider table, and Azure model note, LISTEN_INTERFACE)

  • Duplicate numbering in LLM options (two “Option 3” entries).
  • Provider Compatibility lists “Mistral OCR” as a provider, but it’s not described elsewhere; likely unintended. Consider removing that row unless you actually support a “mistral” OCR provider.
  • Typo: “prebuild-layout” should be “prebuilt-layout”.
  • LISTEN_INTERFACE description says “Network interface to listen on” but default is “8080” (a port). If the variable is indeed a port, clarify text (or rename if feasible in future).

Proposed edits:

-      # Option 3: Ollama (Local)
+      # Option 4: Ollama (Local)
-| **Mistral OCR** | ✅ | ✅ | ✅ |
+| (remove row or document provider) |   |   |   |
-| `AZURE_DOCAI_OUTPUT_CONTENT_FORMAT` | Azure Document Intelligence output content format. Optional if using `azure` provider. Defaults to `text`. 'markdown' is the other option and it requires the 'prebuild-layout' model ID.     | No       | text                       |
+| `AZURE_DOCAI_OUTPUT_CONTENT_FORMAT` | Azure Document Intelligence output content format. Optional if using `azure` provider. Defaults to `text`. 'markdown' is the other option and it requires the 'prebuilt-layout' model ID.    | No       | text                       |
-| `LISTEN_INTERFACE`                  | Network interface to listen on.                                                                                                                                                               | No       | 8080                       |
+| `LISTEN_INTERFACE`                  | Address or port to listen on (e.g. `:8080` or `8080`).                                                                                                                                        | No       | 8080                       |

Also applies to: 385-392, 547-547, 567-567

types.go (1)

128-129: Update comment to include 'whole_pdf' mode

The codebase and README document an additional mode ("whole_pdf"). The comment here should reflect that to avoid drift.

Apply:

-	ProcessMode     string // OCR processing mode: "image" (default) or "pdf"
+	ProcessMode     string // OCR processing mode: "image" (default), "pdf", or "whole_pdf"
jobs.go (2)

134-144: Bind jobCtx to an app/server root context

Using context.Background() breaks propagation of server shutdowns/timeouts into jobs. Prefer deriving jobCtx from a root context passed into the worker pool, so Stop/Shutdown can cascade properly.

Example change:

  • Pass a root ctx into startWorkerPool and processJob
  • Derive jobCtx from that root ctx
-func startWorkerPool(app *App, numWorkers int) {
+func startWorkerPool(ctx context.Context, app *App, numWorkers int) {
   for i := 0; i < numWorkers; i++ {
     go func(workerID int) {
       logger.Infof("Worker %d started", workerID)
-      for job := range jobQueue {
+      for {
+        select {
+        case <-ctx.Done():
+          return
+        case job := <-jobQueue:
           logger.Infof("Worker %d processing job: %s", workerID, job.ID)
-          processJob(app, job)
+          processJob(ctx, app, job)
         }
       }
     }(i)
   }
}
-func processJob(app *App, job *Job) {
+func processJob(parentCtx context.Context, app *App, job *Job) {
   jobStore.updateJobStatus(job.ID, "in_progress", "")
-  jobCtx, cancel := context.WithCancel(context.Background())
+  jobCtx, cancel := context.WithCancel(parentCtx)
   ...
}

145-150: Pre-run cleanup is helpful; consider conditional wipe for partial re-runs

Deleting all per-page results at job start is safe but can be wasteful for interrupted jobs or selective re-runs. Long term, consider keeping existing pages and only overwriting as they’re reprocessed.

local_db.go (1)

52-54: AutoMigrate is fine; indexes are created from struct tags

With the composite unique tags in OCRPageResult, AutoMigrate will create the unique index. No extra DDL needed.

If you want to be explicit and resilient to tag drift, you can assert the index exists post-migration and log a warning if not.

background_test.go (1)

43-49: Mock signature update: LGTM

This aligns the mock with the new interface. Good.

Consider updating the unused TestApp.ProcessDocumentOCR helper below to the new signature as well to prevent future confusion:

-func (app *TestApp) ProcessDocumentOCR(ctx context.Context, documentID int, options OCROptions) (*ProcessedDocument, error) {
+func (app *TestApp) ProcessDocumentOCR(ctx context.Context, documentID int, options OCROptions, jobID string) (*ProcessedDocument, error) {
     // Create a simple processed document with mock data
     mockText := app.ocrProvider.(*mockOCRProvider).text
     return &ProcessedDocument{
         ID:   documentID,
         Text: mockText,
     }, nil
 }
ocr/llm_provider.go (1)

137-155: Consider improving token counting accuracy for non-OpenAI models

The fallback token counting using llms.CountTokens may not be accurate for all models, particularly for Ollama and Mistral models which may use different tokenization schemes. While the code acknowledges this with a comment, it could lead to incorrect OcrLimitHit flags.

Consider adding model-specific token counting or documenting this limitation more prominently:

 		// Fallback: count tokens using langchaingo (might not be accurate for all models)
 		if tokenCount < 0 {
+			// Note: This may not be accurate for Ollama/Mistral models
+			log.Debugf("Using fallback token counting for model %s", p.model)
 			tokenCount = llms.CountTokens(p.model, text)
 		}
main.go (1)

238-273: Consider validating temperature and top-k ranges

While the code handles parsing errors for the Vision LLM configuration values, it doesn't validate that the values are within acceptable ranges. For example, temperature should typically be between 0.0 and 2.0, and top-k should be positive.

Add range validation for better user feedback:

 	if tempStr := os.Getenv("VISION_LLM_TEMPERATURE"); tempStr != "" {
 		if parsed, err := strconv.ParseFloat(tempStr, 64); err == nil {
+			if parsed < 0.0 || parsed > 2.0 {
+				log.Warnf("VISION_LLM_TEMPERATURE value %f is outside typical range [0.0, 2.0]", parsed)
+			}
 			visionLlmTemperature = &parsed
 		} else {
 			log.Warnf("Invalid VISION_LLM_TEMPERATURE value: %v, ignoring", err)
 		}
 	}

 	if topKStr := os.Getenv("OLLAMA_OCR_TOP_K"); topKStr != "" {
 		if parsed, err := strconv.Atoi(topKStr); err == nil {
+			if parsed <= 0 {
+				log.Warnf("OLLAMA_OCR_TOP_K must be positive, got %d, ignoring", parsed)
+			} else {
 				ollamaOcrTopK = &parsed
+			}
 		} else {
 			log.Warnf("Invalid OLLAMA_OCR_TOP_K value: %v, ignoring", err)
 		}
 	}
background.go (1)

213-217: Consider documenting the jobID parameter purpose

The empty string "" passed as the fourth parameter to ProcessDocumentOCR appears to be a jobID. Consider adding a comment to clarify why an empty jobID is used in the background processing context.

Add a clarifying comment:

 		if app.docProcessor != nil {
 			// Use injected processor if available
+			// Empty jobID since background processing doesn't track jobs
 			processedDoc, err = app.docProcessor.ProcessDocumentOCR(ctx, document.ID, options, "")
 		} else {
 			// Use the app's own implementation if no processor is injected
+			// Empty jobID since background processing doesn't track jobs
 			processedDoc, err = app.ProcessDocumentOCR(ctx, document.ID, options, "")
 		}
ocr.go (1)

310-321: Consider handling JSON marshaling errors more gracefully.

While the error from JSON marshaling is silently ignored (which is acceptable), consider logging it for debugging purposes, especially since this is generation metadata that might be important for troubleshooting.

 var genInfoJSON string
 if result.GenerationInfo != nil {
-    if b, err := json.Marshal(result.GenerationInfo); err == nil {
+    if b, err := json.Marshal(result.GenerationInfo); err != nil {
+        pageLogger.WithError(err).Debug("Failed to marshal GenerationInfo")
+    } else {
         genInfoJSON = string(b)
     }
 }
web-app/src/ExperimentalOCR.tsx (3)

35-43: Add error handling for network failures in stopOCRJob.

The function catches errors but only sets a generic error message. Consider providing more specific error feedback to help users understand what went wrong.

 const stopOCRJob = async () => {
   if (!jobId) return;
   try {
     await axios.post(`/api/ocr/jobs/${jobId}/stop`);
     setJobStatus('cancelled');
   } catch (err) {
-    setError('Failed to stop OCR job.');
+    const errorMessage = axios.isAxiosError(err) 
+      ? `Failed to stop OCR job: ${err.response?.data?.message || err.message}`
+      : 'Failed to stop OCR job.';
+    setError(errorMessage);
   }
 };

111-122: Improve error handling for JSON parsing.

The current implementation silently falls back to raw text if JSON parsing fails. Consider logging the parse error for debugging.

 let parsedResult: OCRCombinedResult | null = null;
 try {
   parsedResult = JSON.parse(response.data.result);
 } catch (e) {
+  console.warn('Failed to parse OCR result as JSON, using raw text:', e);
   setOcrResult(response.data.result);
   return;
 }

215-221: DELETE request error handling could be improved.

The cancellation DELETE request failure is only logged to console. Consider showing this error to the user.

 try {
   await axios.delete(`/api/documents/${documentId}/ocr_pages/${pageIdx}/reocr`);
   console.log(`Cancellation request sent for page ${pageIdx}`);
 } catch (err) {
   console.error(`Failed to send cancellation request for page ${pageIdx}:`, err);
+  setReOcrErrors((prev) => ({
+    ...prev,
+    [pageIdx]: "Failed to cancel Re-OCR request.",
+  }));
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a9eeed1 and c14d4c4.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (13)
  • README.md (4 hunks)
  • app_http_handlers.go (4 hunks)
  • background.go (1 hunks)
  • background_test.go (1 hunks)
  • jobs.go (3 hunks)
  • local_db.go (4 hunks)
  • main.go (5 hunks)
  • ocr.go (7 hunks)
  • ocr/llm_provider.go (4 hunks)
  • ocr/provider.go (2 hunks)
  • types.go (2 hunks)
  • web-app/src/ExperimentalOCR.tsx (8 hunks)
  • web-app/src/ocrStatus.ts (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (8)
types.go (1)
ocr.go (1)
  • ProcessedDocument (20-26)
background_test.go (2)
types.go (1)
  • OCROptions (123-129)
ocr.go (1)
  • ProcessedDocument (20-26)
local_db.go (1)
app_http_handlers.go (1)
  • OCRPageResult (280-284)
jobs.go (1)
local_db.go (1)
  • DeleteOcrPageResults (153-155)
web-app/src/ExperimentalOCR.tsx (5)
local_db.go (1)
  • OCRPageResult (25-34)
app_http_handlers.go (1)
  • OCRPageResult (280-284)
web-app/src/ocrStatus.ts (4)
  • OCRJobStatus (1-1)
  • ClientStatus (2-2)
  • mapJobStatus (10-21)
  • getStatusViewOptions (23-44)
web-app/src/DocumentProcessor.tsx (1)
  • Document (9-15)
web-app/e2e/mistral-ocr-processing.spec.ts (1)
  • displayOCRComparison (17-88)
app_http_handlers.go (2)
main.go (1)
  • App (166-184)
local_db.go (3)
  • GetOcrPageResults (143-147)
  • OCRPageResult (25-34)
  • SaveSingleOcrPageResult (116-141)
ocr.go (4)
main.go (1)
  • App (166-184)
types.go (1)
  • OCROptions (123-129)
ocr/provider.go (1)
  • OCRResult (14-29)
local_db.go (1)
  • SaveSingleOcrPageResult (116-141)
ocr/llm_provider.go (1)
ocr/provider.go (1)
  • OCRResult (14-29)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: test
🔇 Additional comments (20)
README.md (1)

165-166: Docs addition for OLLAMA_CONTEXT_LENGTH looks good

Clear and actionable. This will help users tune memory vs. context trade-offs.

jobs.go (3)

163-173: Cancellation handling looks correct

Checking jobCtx.Err() to classify user-initiated cancellations is appropriate. Logging and status transitions are clear.


14-21: Re-OCR cancellers properly registered and cleaned up

I’ve confirmed that reOcrCancellers is both registered and cleaned up to prevent leaks:

  • In app_http_handlers.go:334–339, you lock, cancel any existing entry, and store the new cancelReOcr under cancelKey.
  • The deferred cleanup at app_http_handlers.go:341–345 removes the entry once processing completes.
  • In the cancellation endpoint (app_http_handlers.go:400–404), you lock, delete the entry, and then invoke the cancel func if it existed.

No further action needed.


22-33: TotalPages is already populated in the OCR pipeline – no change needed

The TotalPages field is assigned in the OCR routines before any status update:

  • ocr.go:198 and ocr.go:254 set job.TotalPages = totalPdfPages
  • The HTTP handler in app_http_handlers.go:195 returns it as "total_pages"

Since the field is correctly initialized and exposed to the UI, you can ignore the original suggestion.

Likely an incorrect or invalid review comment.

local_db.go (1)

143-147: LGTM: retrieval ordered by page_index

Ascending order matches expected page display in UI.

ocr/provider.go (1)

45-51: LGTM!

The new configuration fields for Vision LLM settings and Ollama OCR-specific settings are well-structured and properly documented.

ocr/llm_provider.go (3)

22-30: LGTM!

The LLMProvider struct fields are properly extended to include the new configuration options for max tokens, temperature, and Ollama-specific top-k setting.


113-123: LGTM!

The dynamic construction of call options based on configuration is well-implemented, properly checking for non-zero maxTokens and non-nil pointers before adding options.


189-196: LGTM!

The Ollama client creation properly handles the optional context length configuration.

app_http_handlers.go (2)

233-244: LGTM!

The stop OCR job handler is well-implemented with proper mutex protection and appropriate HTTP status codes.


334-345: LGTM!

The cancellation mechanism for per-page re-OCR is properly implemented with mutex protection and cleanup in defer.

main.go (1)

899-910: LGTM!

The Ollama LLM client creation properly shares the context length configuration logic with the Vision LLM path.

background.go (2)

224-227: LGTM!

Good defensive programming - properly handling the nil case for processedDoc.


236-241: LGTM!

The inline function for conditionally adding OCR complete tags is well-implemented and clear.

ocr.go (3)

44-54: LGTM! Job tracking is well-integrated.

The addition of the jobID parameter to the function signature and the conditional logging with the job ID is properly implemented.


267-276: Good implementation of cancellation handling.

The cancellation check with context.Done() is correctly placed before processing each page, and partial results are appropriately returned when cancelled.


195-201: All jobStore accesses are properly synchronized

  • jobStore is initialized at package level (var jobStore = &JobStore{jobs: make(map[string]*Job)}).
  • JobStore embeds a sync.RWMutex and wraps every map access:
    • addJob, updateJobStatus, updatePagesDone use Lock()/Unlock().
    • getJob, GetAllJobs use RLock()/RUnlock().
  • The direct map writes in ocr.go (lines 195–201) are already guarded by jobStore.Lock()/Unlock().

No further changes needed.

web-app/src/ocrStatus.ts (1)

1-44: LGTM! Well-structured status management.

The type definitions and status view logic are clean and well-organized. The getStatusViewOptions function provides a clear mapping between statuses and UI states.

web-app/src/ExperimentalOCR.tsx (2)

8-13: Excellent implementation of per-page OCR functionality!

The per-page OCR results display with re-OCR capabilities is well-structured. The UI properly handles loading states, errors, and provides clear visual feedback with token limit indicators and generation info tooltips.

Also applies to: 28-29, 157-387


105-108: Race condition in per-page results fetching.

The comparison newPagesDone > lastFetchedPagesDoneRef.current could lead to missed updates if multiple status checks occur rapidly. Consider using a more robust synchronization mechanism.

-if (newPagesDone > lastFetchedPagesDoneRef.current) {
+// Always fetch if pages have been processed and we're not at the same count
+if (newPagesDone > 0 && newPagesDone !== lastFetchedPagesDoneRef.current) {
   await fetchPerPageResults();
   lastFetchedPagesDoneRef.current = newPagesDone;
 }

Likely an incorrect or invalid review comment.

Comment on lines +266 to +302
func (app *App) getOCRPagesHandler(c *gin.Context) {
id := c.Param("id")
parsedID, err := strconv.Atoi(id)
if err != nil {
c.JSON(http.StatusBadRequest, gin.H{"error": "Invalid document ID"})
return
}

dbResults, err := GetOcrPageResults(app.Database, parsedID)
if err != nil {
c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to fetch OCR page results"})
return
}

type OCRPageResult struct {
Text string `json:"text"`
OcrLimitHit bool `json:"ocrLimitHit"`
GenerationInfo map[string]interface{} `json:"generationInfo,omitempty"`
}

var pages []OCRPageResult
for _, res := range dbResults {
var genInfo map[string]interface{}
if res.GenerationInfo != "" {
_ = json.Unmarshal([]byte(res.GenerationInfo), &genInfo)
}
pages = append(pages, OCRPageResult{
Text: res.Text,
OcrLimitHit: res.OcrLimitHit,
GenerationInfo: genInfo,
})
}

c.JSON(http.StatusOK, gin.H{
"pages": pages,
})
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add error handling for JSON unmarshaling

The JSON unmarshaling of GenerationInfo at line 290 ignores errors silently. While this may be intentional for backward compatibility, it could hide data corruption issues.

Add error logging for failed JSON unmarshaling:

 		var genInfo map[string]interface{}
 		if res.GenerationInfo != "" {
-			_ = json.Unmarshal([]byte(res.GenerationInfo), &genInfo)
+			if err := json.Unmarshal([]byte(res.GenerationInfo), &genInfo); err != nil {
+				log.Warnf("Failed to unmarshal GenerationInfo for doc %d page %d: %v", parsedID, i, err)
+			}
 		}

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In app_http_handlers.go around lines 266 to 302, the json.Unmarshal call for
res.GenerationInfo ignores errors; update the loop to capture the unmarshaling
error and log it (including enough context such as parsedID and res identifiers)
instead of discarding it. Specifically, change the unmarshal to if err :=
json.Unmarshal([]byte(res.GenerationInfo), &genInfo); err != nil { /* log error
with context using app logger or the standard logger */ } so the handler records
bad/malformed GenerationInfo while continuing to return other page data.

Comment on lines +319 to +323
imagePaths, _, err := app.Client.DownloadDocumentAsImages(c.Request.Context(), parsedID, limitOcrPages)
if err != nil || pageIdx < 0 || pageIdx >= len(imagePaths) {
c.JSON(http.StatusBadRequest, gin.H{"error": "Invalid page index or failed to download images"})
return
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Verify image index bounds before file access

The code downloads images and then checks if pageIdx is within bounds, but it reads the file before performing adequate validation. While the bounds check exists, the error message combines two different error conditions.

Separate the error conditions for clarity:

 	imagePaths, _, err := app.Client.DownloadDocumentAsImages(c.Request.Context(), parsedID, limitOcrPages)
-	if err != nil || pageIdx < 0 || pageIdx >= len(imagePaths) {
-		c.JSON(http.StatusBadRequest, gin.H{"error": "Invalid page index or failed to download images"})
+	if err != nil {
+		c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to download images"})
+		return
+	}
+	if pageIdx < 0 || pageIdx >= len(imagePaths) {
+		c.JSON(http.StatusBadRequest, gin.H{"error": fmt.Sprintf("Invalid page index %d, document has %d pages", pageIdx, len(imagePaths))})
 		return
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
imagePaths, _, err := app.Client.DownloadDocumentAsImages(c.Request.Context(), parsedID, limitOcrPages)
if err != nil || pageIdx < 0 || pageIdx >= len(imagePaths) {
c.JSON(http.StatusBadRequest, gin.H{"error": "Invalid page index or failed to download images"})
return
}
imagePaths, _, err := app.Client.DownloadDocumentAsImages(c.Request.Context(), parsedID, limitOcrPages)
if err != nil {
c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to download images"})
return
}
if pageIdx < 0 || pageIdx >= len(imagePaths) {
c.JSON(http.StatusBadRequest, gin.H{"error": fmt.Sprintf("Invalid page index %d, document has %d pages", pageIdx, len(imagePaths))})
return
}
🤖 Prompt for AI Agents
In app_http_handlers.go around lines 319 to 323, the code calls
DownloadDocumentAsImages and then treats a combined error condition (download
error or invalid page index) as a single BadRequest; instead, first check and
handle the download error (return 500 or appropriate server error with the
download error message) before inspecting imagePaths, then separately validate
pageIdx against len(imagePaths) and return a clear 400 BadRequest if the index
is out of bounds; ensure you do not reference imagePaths length when err != nil
and return distinct, descriptive error responses for the two failure modes.

Comment on lines +365 to +370
var genInfoJSON string
if result.GenerationInfo != nil {
if b, err := json.Marshal(result.GenerationInfo); err == nil {
genInfoJSON = string(b)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add error handling for JSON marshaling

The JSON marshaling at line 367 could fail, but errors are silently ignored, which could result in empty genInfoJSON being saved.

Handle JSON marshaling errors:

 	var genInfoJSON string
 	if result.GenerationInfo != nil {
-		if b, err := json.Marshal(result.GenerationInfo); err == nil {
+		if b, err := json.Marshal(result.GenerationInfo); err != nil {
+			log.Errorf("Failed to marshal GenerationInfo for doc %d page %d: %v", parsedID, pageIdx, err)
+		} else {
 			genInfoJSON = string(b)
 		}
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
var genInfoJSON string
if result.GenerationInfo != nil {
if b, err := json.Marshal(result.GenerationInfo); err == nil {
genInfoJSON = string(b)
}
}
var genInfoJSON string
if result.GenerationInfo != nil {
if b, err := json.Marshal(result.GenerationInfo); err != nil {
log.Errorf("Failed to marshal GenerationInfo for doc %d page %d: %v", parsedID, pageIdx, err)
} else {
genInfoJSON = string(b)
}
}
🤖 Prompt for AI Agents
In app_http_handlers.go around lines 365 to 370, the json.Marshal call for
result.GenerationInfo currently ignores errors which can lead to empty
genInfoJSON; change the code to check the error from json.Marshal, and on error
log the marshaling error with context (including any identifiers like request ID
or generation ID) and return or propagate an appropriate error response (e.g.,
HTTP 500 / internal error) instead of proceeding with an empty string; if
returning is not appropriate in this handler, at minimum set genInfoJSON to a
safe default and log the error so the failure is visible.

Comment on lines +25 to +35
type OCRPageResult struct {
ID uint `gorm:"primaryKey"`
DocumentID int `gorm:"index;not null"`
PageIndex int `gorm:"not null"`
Text string `gorm:"size:1048576"`
OcrLimitHit bool
GenerationInfo string `gorm:"type:TEXT"`
CreatedAt time.Time
UpdatedAt time.Time
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add a composite unique index on (document_id, page_index)

Without a uniqueness constraint, concurrent upserts can create duplicate rows. Add a composite unique index so updates are deterministic and performant.

 type OCRPageResult struct {
-    ID             uint   `gorm:"primaryKey"`
-    DocumentID     int    `gorm:"index;not null"`
-    PageIndex      int    `gorm:"not null"`
+    ID             uint   `gorm:"primaryKey"`
+    DocumentID     int    `gorm:"not null;uniqueIndex:idx_doc_page"`
+    PageIndex      int    `gorm:"not null;uniqueIndex:idx_doc_page"`
     Text           string `gorm:"size:1048576"`
     OcrLimitHit    bool
     GenerationInfo string `gorm:"type:TEXT"`
     CreatedAt      time.Time
     UpdatedAt      time.Time
 }
🤖 Prompt for AI Agents
In local_db.go around lines 25 to 35, the OCRPageResult struct lacks a composite
unique index on (DocumentID, PageIndex); add a composite unique index by
updating the GORM tags for DocumentID and PageIndex to share the same index name
and mark it unique (e.g. index:idx_document_page,unique) so the pair is enforced
as unique, then run or include a DB migration/AutoMigrate to create the new
index to prevent duplicate rows during concurrent upserts.

Comment on lines +115 to +141
// SaveSingleOcrPageResult saves or updates the OCR result for a single page, including GenerationInfo as JSON
func SaveSingleOcrPageResult(db *gorm.DB, docID int, pageIdx int, text string, ocrLimitHit bool, generationInfoJSON string) error {
var result OCRPageResult
tx := db.Where("document_id = ? AND page_index = ?", docID, pageIdx).First(&result)
if tx.Error == nil {
result.Text = text
result.OcrLimitHit = ocrLimitHit
result.GenerationInfo = generationInfoJSON
return db.Save(&result).Error
} else if tx.Error != nil {
log.Debugf("SaveSingleOcrPageResult: db.First error: %v (is gorm.ErrRecordNotFound: %v)", tx.Error, errors.Is(tx.Error, gorm.ErrRecordNotFound))
if errors.Is(tx.Error, gorm.ErrRecordNotFound) {
result = OCRPageResult{
DocumentID: docID,
PageIndex: pageIdx,
Text: text,
OcrLimitHit: ocrLimitHit,
GenerationInfo: generationInfoJSON,
}
return db.Create(&result).Error
} else {
log.Errorf("Unexpected DB error in SaveSingleOcrPageResult: %v", tx.Error)
return tx.Error
}
}
return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Replace find-then-save with an atomic upsert

Current flow can race under concurrency and costs an extra query. Prefer a single INSERT ... ON CONFLICT DO UPDATE keyed by (document_id, page_index).

-func SaveSingleOcrPageResult(db *gorm.DB, docID int, pageIdx int, text string, ocrLimitHit bool, generationInfoJSON string) error {
-    var result OCRPageResult
-    tx := db.Where("document_id = ? AND page_index = ?", docID, pageIdx).First(&result)
-    if tx.Error == nil {
-        result.Text = text
-        result.OcrLimitHit = ocrLimitHit
-        result.GenerationInfo = generationInfoJSON
-        return db.Save(&result).Error
-    } else if tx.Error != nil {
-        log.Debugf("SaveSingleOcrPageResult: db.First error: %v (is gorm.ErrRecordNotFound: %v)", tx.Error, errors.Is(tx.Error, gorm.ErrRecordNotFound))
-        if errors.Is(tx.Error, gorm.ErrRecordNotFound) {
-            result = OCRPageResult{
-                DocumentID:     docID,
-                PageIndex:      pageIdx,
-                Text:           text,
-                OcrLimitHit:    ocrLimitHit,
-                GenerationInfo: generationInfoJSON,
-            }
-            return db.Create(&result).Error
-        } else {
-            log.Errorf("Unexpected DB error in SaveSingleOcrPageResult: %v", tx.Error)
-            return tx.Error
-        }
-    }
-    return nil
-}
+func SaveSingleOcrPageResult(db *gorm.DB, docID int, pageIdx int, text string, ocrLimitHit bool, generationInfoJSON string) error {
+    rec := OCRPageResult{
+        DocumentID:     docID,
+        PageIndex:      pageIdx,
+        Text:           text,
+        OcrLimitHit:    ocrLimitHit,
+        GenerationInfo: generationInfoJSON,
+    }
+    return db.
+        Clauses(clause.OnConflict{
+            Columns:   []clause.Column{{Name: "document_id"}, {Name: "page_index"}},
+            DoUpdates: clause.AssignmentColumns([]string{"text", "ocr_limit_hit", "generation_info", "updated_at"}),
+        }).
+        Create(&rec).Error
+}

Add the import needed for clause:

import (
    "errors"
    ...
    "gorm.io/gorm/clause"
)
🤖 Prompt for AI Agents
In local_db.go around lines 115 to 141, replace the find-then-save pattern with
an atomic upsert to avoid races and the extra query: add the
"gorm.io/gorm/clause" import, construct the OCRPageResult struct with the
provided fields, and use db.Clauses(clause.OnConflict{Columns:
[]clause.Column{{Name:"document_id"},{Name:"page_index"}}, UpdateAll:
true}).Create(&result) so the row is inserted or updated in a single statement
keyed on (document_id, page_index); remove the prior db.Where(...).First(...)
logic and related branching, returning the Create error if any.

types.go Outdated
Comment on lines 101 to 102
AddTags []string `json:"add_tags,omitempty"`
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

AddTags field: LGTM; verify downstream handling

Adding add_tags aligns with the UI/UX of incremental tagging. Please verify that UpdateDocuments (client and handlers) actually interprets add_tags distinct from suggested_tags and remove_tags to avoid double-application or conflicts.

Run to find usages of add_tags in server handlers and client code:


🏁 Script executed:

#!/bin/bash
rg -n --json 'add_tags' -A 3 -B 3

Length of output: 1890


Implement add_tags handling in UpdateDocuments

The new add_tags field is defined in types.go but isn’t referenced anywhere else (no matches beyond its declaration). As a result, calls to UpdateDocuments won’t apply added tags. Please:

  • In the server handler (e.g. the UpdateDocuments method in server/handlers/...), read add_tags from the incoming payload and merge those tags into the document, guarding against conflicts with suggested_tags and remove_tags.
  • In the client (UpdateDocuments method in client/...), include the add_tags array when constructing the request.
  • Add or update unit/integration tests to verify incremental tag additions work as expected.
🤖 Prompt for AI Agents
In types.go around lines 101-102 the AddTags []string
`json:"add_tags,omitempty"` field is declared but never used; update the server
handler (server/handlers/... UpdateDocuments) to read payload.add_tags, merge
those tags into the target document’s tag set (deduplicate), ensure tags present
in remove_tags are not re-added and that add_tags do not override/duplicate
suggested_tags handling (apply a clear precedence: remove_tags wins, then
add_tags, then suggested_tags), and persist the merged tag set; update the
client (client/... UpdateDocuments) to include the add_tags array when building
the request payload; and add/update unit/integration tests that call
UpdateDocuments with combinations of add_tags, remove_tags, and suggested_tags
to assert tags are correctly added, removed, and deduplicated.

Comment on lines +148 to 149
ProcessDocumentOCR(ctx context.Context, documentID int, options OCROptions, jobID string) (*ProcessedDocument, error)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Signature change to ProcessDocumentOCR: ensure all implementors/callers updated

This is a public interface change. Please confirm there are no remaining implementations or call sites using the old 3-arg signature.

Use this script to surface function declarations and invocations that may still use the old form:


🏁 Script executed:

#!/bin/bash
# Find all function declarations named ProcessDocumentOCR
rg -n 'func\s+\(.*\)\s+ProcessDocumentOCR\(' -A 2

# Show call sites and arity hints (look for 3 vs 4 args)
rg -n 'ProcessDocumentOCR\(' -A 1 -B 1

Length of output: 2551


Fix remaining TestApp mock to match new signature

The TestApp.ProcessDocumentOCR override in background_test.go is still using the old 3-arg form. Update it to include the new jobID string parameter:

• background_test.go:329

- func (app *TestApp) ProcessDocumentOCR(ctx context.Context, documentID int, options OCROptions) (*ProcessedDocument, error) {
+ func (app *TestApp) ProcessDocumentOCR(ctx context.Context, documentID int, options OCROptions, jobID string) (*ProcessedDocument, error) {

Adjust any test logic inside the method to accept or ignore the jobID as needed.

🤖 Prompt for AI Agents
In types.go around lines 148-149 the ProcessDocumentOCR signature was changed to
include a jobID string; update the TestApp.ProcessDocumentOCR mock in
background_test.go (around line 329) to match that new signature — add the final
jobID string parameter to the function declaration, and update the mock body to
accept or ignore jobID as appropriate (and adjust any references or expectations
in the test to use the new parameter); also ensure any test calls to
TestApp.ProcessDocumentOCR pass a jobID or use a placeholder.

Comment on lines +99 to +100
const newJobStatus = mapJobStatus(response.data.status);
setJobStatus(newJobStatus);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Handle potential mapJobStatus errors gracefully.

Since mapJobStatus can throw an error for unknown statuses (as noted in the ocrStatus.ts review), this could crash the component. Add error handling here.

 try {
   const response = await axios.get(`./api/jobs/ocr/${jobId}`);
-  const newJobStatus = mapJobStatus(response.data.status);
-  setJobStatus(newJobStatus);
+  try {
+    const newJobStatus = mapJobStatus(response.data.status);
+    setJobStatus(newJobStatus);
+  } catch (mapError) {
+    console.error('Failed to map job status:', mapError);
+    setJobStatus('idle');
+    setError('Received unknown job status from server.');
+    return;
+  }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const newJobStatus = mapJobStatus(response.data.status);
setJobStatus(newJobStatus);
const response = await axios.get(`./api/jobs/ocr/${jobId}`);
try {
const newJobStatus = mapJobStatus(response.data.status);
setJobStatus(newJobStatus);
} catch (mapError) {
console.error('Failed to map job status:', mapError);
setJobStatus('idle');
setError('Received unknown job status from server.');
return;
}
🤖 Prompt for AI Agents
In web-app/src/ExperimentalOCR.tsx around lines 99 to 100, the call to
mapJobStatus(response.data.status) can throw for unknown statuses and currently
can crash the component; wrap the mapping in a try/catch, fall back to a safe
status (e.g., 'UNKNOWN' or a defined error state) and call setJobStatus with
that fallback, and log/report the caught error (console.error or an
error-tracking helper) so the component continues rendering without crashing.

Comment on lines +188 to +190
if (pageIdx + 1 > lastFetchedPagesDoneRef.current) {
lastFetchedPagesDoneRef.current = pageIdx + 1;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Logic issue with page progress tracking.

The condition pageIdx + 1 > lastFetchedPagesDoneRef.current seems incorrect. Re-OCRing a single page shouldn't necessarily update the overall progress counter which tracks bulk OCR job progress.

This logic appears to mix single-page re-OCR progress with bulk job progress tracking. Consider removing these lines or using a separate tracking mechanism for re-OCR operations:

-if (pageIdx + 1 > lastFetchedPagesDoneRef.current) {
-  lastFetchedPagesDoneRef.current = pageIdx + 1;
-}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (pageIdx + 1 > lastFetchedPagesDoneRef.current) {
lastFetchedPagesDoneRef.current = pageIdx + 1;
}
🤖 Prompt for AI Agents
In web-app/src/ExperimentalOCR.tsx around lines 188-190, the code updates
lastFetchedPagesDoneRef.current during a single-page re-OCR (if (pageIdx + 1 >
lastFetchedPagesDoneRef.current) { lastFetchedPagesDoneRef.current = pageIdx +
1; }), which incorrectly mixes single-page re-OCR progress with bulk OCR job
tracking; remove these two lines or change the logic so lastFetchedPagesDoneRef
is only advanced during bulk/batch OCR flows (e.g., guard the update with an
explicit isBulkOrInitialRun flag or use a separate ref/state for re-OCR
progress) so re-OCR of a single page does not alter the overall bulk progress
counter.

Comment on lines +10 to +21
export const mapJobStatus = (raw: string | null | undefined): OCRJobStatus => {
switch (raw) {
case 'pending':
case 'in_progress':
case 'completed':
case 'failed':
case 'cancelled':
return raw;
default:
throw new Error(`Unknown job status: ${raw}`);
}
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider returning 'idle' instead of throwing for unknown statuses.

The mapJobStatus function throws an error for unknown/null/undefined status values. This could cause the UI to crash if the backend returns an unexpected status. Consider returning 'idle' as a safe default instead.

 export const mapJobStatus = (raw: string | null | undefined): OCRJobStatus => {
   switch (raw) {
     case 'pending':
     case 'in_progress':
     case 'completed':
     case 'failed':  
     case 'cancelled':
       return raw;
     default:
-      throw new Error(`Unknown job status: ${raw}`);
+      console.warn(`Unknown job status: ${raw}, defaulting to 'idle'`);
+      return 'idle';
   }
 };
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export const mapJobStatus = (raw: string | null | undefined): OCRJobStatus => {
switch (raw) {
case 'pending':
case 'in_progress':
case 'completed':
case 'failed':
case 'cancelled':
return raw;
default:
throw new Error(`Unknown job status: ${raw}`);
}
};
export const mapJobStatus = (raw: string | null | undefined): OCRJobStatus => {
switch (raw) {
case 'pending':
case 'in_progress':
case 'completed':
case 'failed':
case 'cancelled':
return raw;
default:
console.warn(`Unknown job status: ${raw}, defaulting to 'idle'`);
return 'idle';
}
};
🤖 Prompt for AI Agents
In web-app/src/ocrStatus.ts around lines 10 to 21, the function currently throws
on unknown/null/undefined statuses which can crash the UI; change the default
behavior so that instead of throwing an error you return the safe default value
'idle' (and explicitly handle null/undefined by returning 'idle' as well) so
unexpected backend values won't bubble up as exceptions.

@icereed
Copy link
Owner

icereed commented Aug 17, 2025

Hey 👋
Thank you for this awesome feature! I will try to review this in the next days. It might take a bit longer since I’m quite busy with several “projects”. But still I’m very grateful for this contribution. 🙏

@adfaklsdjf
Copy link

I've built an image from this and am using it on my Unraid Paperless install. I'm really only using the VISION_LLM_TEMPERATURE configuration option so I can use gpt-5-mini for OCR.. I'm not really exercising any of the other features in the PR.

The image is at adfaklsdjf/paperless-gpt:pr-573 on dockerhub. gpt-5mini is working for me. 👍

Thank you @dawidkulpa for your work.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
app_http_handlers.go (2)

355-358: Log GenerationInfo unmarshal failures

Swallowing the error hides corrupt entries and makes debugging impossible. Please log the failure with context so we know when a page payload is bad.

-		if res.GenerationInfo != "" {
-			_ = json.Unmarshal([]byte(res.GenerationInfo), &genInfo)
+		if res.GenerationInfo != "" {
+			if err := json.Unmarshal([]byte(res.GenerationInfo), &genInfo); err != nil {
+				log.Warnf("Failed to unmarshal GenerationInfo for doc %d page %d: %v", parsedID, res.PageIndex, err)
+			}
 		}

433-436: Handle GenerationInfo marshal errors

We still drop marshal failures silently, which loses diagnostics and stores empty JSON. Log the error and only persist the payload when marshaling succeeds.

-	if result.GenerationInfo != nil {
-		if b, err := json.Marshal(result.GenerationInfo); err == nil {
-			genInfoJSON = string(b)
-		}
-	}
+	if result.GenerationInfo != nil {
+		if b, err := json.Marshal(result.GenerationInfo); err != nil {
+			log.Errorf("Failed to marshal GenerationInfo for doc %d page %d: %v", parsedID, pageIdx, err)
+		} else {
+			genInfoJSON = string(b)
+		}
+	}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c14d4c4 and f5d8302.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (5)
  • README.md (4 hunks)
  • app_http_handlers.go (4 hunks)
  • background.go (1 hunks)
  • main.go (5 hunks)
  • types.go (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • README.md
🧰 Additional context used
📓 Path-based instructions (1)
**/*.go

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Ensure all Go source files are formatted with gofmt before committing

Files:

  • types.go
  • app_http_handlers.go
  • background.go
  • main.go
🧠 Learnings (1)
📚 Learning: 2025-09-28T18:55:46.584Z
Learnt from: CR
PR: icereed/paperless-gpt#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-09-28T18:55:46.584Z
Learning: Applies to ocr/**/*.go : Implement OCR providers under the ocr/ package

Applied to files:

  • app_http_handlers.go
  • background.go
  • main.go
🧬 Code graph analysis (3)
types.go (1)
ocr.go (1)
  • ProcessedDocument (20-26)
app_http_handlers.go (2)
main.go (1)
  • App (110-128)
local_db.go (3)
  • GetOcrPageResults (143-147)
  • OCRPageResult (25-34)
  • SaveSingleOcrPageResult (116-141)
background.go (2)
types.go (1)
  • DocumentSuggestion (114-128)
web-app/src/DocumentProcessor.tsx (1)
  • DocumentSuggestion (36-45)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: test

Comment on lines +438 to +447
saveErr := SaveSingleOcrPageResult(app.Database, parsedID, pageIdx, result.Text, result.OcrLimitHit, genInfoJSON)
if saveErr != nil {
log.Errorf("Failed to save re-OCR result for doc %d page %d: %v", parsedID, pageIdx, saveErr)
}

c.JSON(http.StatusOK, gin.H{
"text": result.Text,
"ocrLimitHit": result.OcrLimitHit,
"generationInfo": result.GenerationInfo,
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Surface DB write failures to the client

If persisting the page result fails we currently return 200, so the UI shows success even though nothing was saved. Please fail fast and tell the client so they can retry.

	saveErr := SaveSingleOcrPageResult(app.Database, parsedID, pageIdx, result.Text, result.OcrLimitHit, genInfoJSON)
	if saveErr != nil {
		log.Errorf("Failed to save re-OCR result for doc %d page %d: %v", parsedID, pageIdx, saveErr)
-	}
-
-	c.JSON(http.StatusOK, gin.H{
+		c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to persist re-OCR result"})
+		return
+	}
+
+	c.JSON(http.StatusOK, gin.H{
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
saveErr := SaveSingleOcrPageResult(app.Database, parsedID, pageIdx, result.Text, result.OcrLimitHit, genInfoJSON)
if saveErr != nil {
log.Errorf("Failed to save re-OCR result for doc %d page %d: %v", parsedID, pageIdx, saveErr)
}
c.JSON(http.StatusOK, gin.H{
"text": result.Text,
"ocrLimitHit": result.OcrLimitHit,
"generationInfo": result.GenerationInfo,
})
saveErr := SaveSingleOcrPageResult(app.Database, parsedID, pageIdx, result.Text, result.OcrLimitHit, genInfoJSON)
if saveErr != nil {
log.Errorf("Failed to save re-OCR result for doc %d page %d: %v", parsedID, pageIdx, saveErr)
c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to persist re-OCR result"})
return
}
c.JSON(http.StatusOK, gin.H{
"text": result.Text,
"ocrLimitHit": result.OcrLimitHit,
"generationInfo": result.GenerationInfo,
})
🤖 Prompt for AI Agents
In app_http_handlers.go around lines 438-447 the handler logs
SaveSingleOcrPageResult failures but still returns HTTP 200; change the control
flow to return an error response when saveErr != nil by responding with a
non-200 status (e.g., http.StatusInternalServerError or http.StatusBadGateway)
and a concise JSON error body (message field), then immediately return so the
client sees the failure and can retry; do not remove the existing log, but
ensure the error response does not leak sensitive internals.

@icereed icereed merged commit fbc0ae8 into icereed:main Oct 13, 2025
5 of 6 checks passed
@icereed
Copy link
Owner

icereed commented Oct 13, 2025

Thank you all for your patience!
This week be part of the next release. 🎉🥳

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants