Skip to content

ankesh7/OCRFlux-API

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OCRFlux API - FastAPI Wrapper for ChatDOC's OCRFlux

A production-ready FastAPI wrapper around ChatDOC's OCRFlux, providing REST API endpoints for PDF processing with state-of-the-art OCR capabilities. OCRFlux delivers superior document understanding by converting PDFs and images to high-quality Markdown format.

Features

  • Official OCRFlux Integration - Uses ChatDOC's OCRFlux-3B model
  • Automatic Model Download - Downloads OCRFlux-3B model automatically on first run
  • FastAPI REST API with automatic documentation and validation
  • Health check endpoint for monitoring service status
  • PDF processing from URLs with comprehensive OCR extraction
  • Docker containerization with GPU support
  • Persistent Model Storage - Models are cached in Docker volumes
  • Markdown output with advanced document structure understanding
  • Multi-language support for global document processing
  • Production-ready with proper error handling and logging

Requirements

Hardware Requirements

  • NVIDIA GPU with at least 12GB of GPU RAM (tested on RTX 3090, 4090, L40S, A100, H100)
  • 20GB free disk space for models and processing
  • 16GB system RAM recommended

Software Requirements

Quick Start

1. Clone and Deploy

# Clone this repository
git clone <repository-url>
cd ocrflux

# Build and start the service
docker-compose up --build

# The service will automatically:
# 1. Download OCRFlux-3B model from HuggingFace (first run only)
# 2. Cache the model in a Docker volume
# 3. Start the FastAPI server

2. Access the API

  • API Base URL: http://localhost:8000
  • Interactive API Docs: http://localhost:8000/docs
  • Alternative Docs: http://localhost:8000/redoc

3. First Run

On the first run, the service will:

  1. Download the OCRFlux-3B model (~3-5GB) from HuggingFace
  2. Cache it in a Docker volume for future use
  3. This may take 10-15 minutes depending on your internet connection

API Endpoints

Health Check

Monitor service status, GPU availability, and model download status.

Endpoint: GET /health

Response:

{
  "status": "healthy",
  "timestamp": "2024-01-15T10:30:00.000Z",
  "service": "OCRFlux-API",
  "version": "1.0.0",
  "gpu_available": true,
  "model_ready": true
}

Model Information

Get details about the loaded OCRFlux model and download status.

Endpoint: GET /model-info

Response:

{
  "model_name": "OCRFlux-3B",
  "model_path": "/OCRFlux-3B",
  "provider": "ChatDOC",
  "description": "OCRFlux is a state-of-the-art OCR model for document understanding",
  "supports_languages": "Multi-language support",
  "output_format": "Markdown",
  "gpu_required": true,
  "min_gpu_memory": "12GB",
  "model_downloaded": true,
  "huggingface_repo": "ChatDOC/OCRFlux-3B",
  "auto_download": true
}

Process PDF

Extract structured content from PDF documents as Markdown.

Endpoint: POST /process-pdf

Request Body:

{
  "pdf_url": "https://example.com/document.pdf",
  "options": {
    "max_page_retries": 3,
    "gpu_memory_utilization": 0.8,
    "max_model_len": 8192
  }
}

Response:

{
  "success": true,
  "message": "PDF processed successfully with OCRFlux",
  "processing_time": 45.67,
  "data": {
    "success": true,
    "document_text": "# Document Title\n\nThis is the extracted markdown content...",
    "metadata": {
      "original_path": "/tmp/input.pdf",
      "num_pages": 10,
      "fallback_pages": [],
      "processing_method": "OCRFlux-3B"
    },
    "pages": [
      {
        "page_number": 1,
        "text": "# Page 1 Content\n\nMarkdown formatted content...",
        "format": "markdown",
        "text_length": 1524
      }
    ],
    "summary": {
      "total_pages": 10,
      "processed_pages": 10,
      "fallback_pages": 0,
      "total_text_length": 15240,
      "processing_time": 45.67,
      "has_fallbacks": false
    }
  }
}

Configuration

Environment Variables

Configure the service using environment variables in docker-compose.yml:

Variable Default Description
MODEL_PATH /OCRFlux-3B Path to OCRFlux model in container
HUGGINGFACE_MODEL_NAME ChatDOC/OCRFlux-3B HuggingFace model repository
AUTO_DOWNLOAD_MODEL true Enable automatic model downloading
MAX_PDF_SIZE 104857600 Max PDF size in bytes (100MB)
MAX_PAGES 1000 Maximum pages to process
MAX_PAGE_RETRIES 3 Retry failed pages
REQUEST_TIMEOUT 600 Request timeout in seconds
GPU_MEMORY_UTILIZATION 0.8 GPU memory utilization for vLLM
MAX_MODEL_LEN 8192 Maximum model context length
LOG_LEVEL INFO Logging level

Custom Model Storage

By default, the model is stored in a Docker-managed volume. For custom storage:

Option 1: Bind Mount (Custom Host Directory)

volumes:
  # Replace Docker volume with bind mount
  - /path/to/your/models:/OCRFlux-3B

Option 2: Keep Docker Volume (Recommended)

# Find the volume location
docker volume inspect ocrflux_model

# Access the volume data
docker run --rm -v ocrflux_model:/models alpine ls -la /models

OCRFlux Options

The options parameter in the API request supports:

  • max_page_retries: Number of retries for failed pages
  • gpu_memory_utilization: GPU memory usage (0.0-1.0)
  • max_model_len: Maximum context length

Usage Examples

Using curl

# Health check
curl -X GET http://localhost:8000/health

# Get model information
curl -X GET http://localhost:8000/model-info

# Process PDF
curl -X POST http://localhost:8000/process-pdf \
  -H "Content-Type: application/json" \
  -d '{
    "pdf_url": "https://example.com/sample.pdf",
    "options": {
      "max_page_retries": 3,
      "gpu_memory_utilization": 0.8
    }
  }'

Model Management

Automatic Download

The service automatically downloads the OCRFlux-3B model on first run:

  1. First Run: Downloads model from HuggingFace (~3-5GB)
  2. Subsequent Runs: Uses cached model from Docker volume
  3. Model Location: Stored in Docker-managed volume
  4. Download Methods:
    • Primary: HuggingFace Hub library
    • Fallback: Git clone from HuggingFace

Manual Model Management

To manually manage the model:

# Check model status
curl http://localhost:8000/model-info

# View volume contents
docker run --rm -v ocrflux_model:/models alpine ls -la /models

# Remove model cache (will re-download on next start)
docker volume rm ocrflux_model

# Start fresh
docker-compose down
docker volume rm ocrflux_model
docker-compose up --build

Monitoring and Troubleshooting

GPU Monitoring

Check GPU usage and memory:

# Monitor GPU usage
nvidia-smi

# Watch GPU usage in real-time
watch -n 1 nvidia-smi

Service Logs

View service logs:

# View logs
docker-compose logs -f ocrflux

# Check for GPU-related issues
docker-compose logs ocrflux | grep -i gpu

# Check model download progress
docker-compose logs ocrflux | grep -i download

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published