A production-ready FastAPI wrapper around ChatDOC's OCRFlux, providing REST API endpoints for PDF processing with state-of-the-art OCR capabilities. OCRFlux delivers superior document understanding by converting PDFs and images to high-quality Markdown format.
- Official OCRFlux Integration - Uses ChatDOC's OCRFlux-3B model
- Automatic Model Download - Downloads OCRFlux-3B model automatically on first run
- FastAPI REST API with automatic documentation and validation
- Health check endpoint for monitoring service status
- PDF processing from URLs with comprehensive OCR extraction
- Docker containerization with GPU support
- Persistent Model Storage - Models are cached in Docker volumes
- Markdown output with advanced document structure understanding
- Multi-language support for global document processing
- Production-ready with proper error handling and logging
- NVIDIA GPU with at least 12GB of GPU RAM (tested on RTX 3090, 4090, L40S, A100, H100)
- 20GB free disk space for models and processing
- 16GB system RAM recommended
- Docker with NVIDIA GPU support
- Docker Compose
# Clone this repository
git clone <repository-url>
cd ocrflux
# Build and start the service
docker-compose up --build
# The service will automatically:
# 1. Download OCRFlux-3B model from HuggingFace (first run only)
# 2. Cache the model in a Docker volume
# 3. Start the FastAPI server
- API Base URL:
http://localhost:8000
- Interactive API Docs:
http://localhost:8000/docs
- Alternative Docs:
http://localhost:8000/redoc
On the first run, the service will:
- Download the OCRFlux-3B model (~3-5GB) from HuggingFace
- Cache it in a Docker volume for future use
- This may take 10-15 minutes depending on your internet connection
Monitor service status, GPU availability, and model download status.
Endpoint: GET /health
Response:
{
"status": "healthy",
"timestamp": "2024-01-15T10:30:00.000Z",
"service": "OCRFlux-API",
"version": "1.0.0",
"gpu_available": true,
"model_ready": true
}
Get details about the loaded OCRFlux model and download status.
Endpoint: GET /model-info
Response:
{
"model_name": "OCRFlux-3B",
"model_path": "/OCRFlux-3B",
"provider": "ChatDOC",
"description": "OCRFlux is a state-of-the-art OCR model for document understanding",
"supports_languages": "Multi-language support",
"output_format": "Markdown",
"gpu_required": true,
"min_gpu_memory": "12GB",
"model_downloaded": true,
"huggingface_repo": "ChatDOC/OCRFlux-3B",
"auto_download": true
}
Extract structured content from PDF documents as Markdown.
Endpoint: POST /process-pdf
Request Body:
{
"pdf_url": "https://example.com/document.pdf",
"options": {
"max_page_retries": 3,
"gpu_memory_utilization": 0.8,
"max_model_len": 8192
}
}
Response:
{
"success": true,
"message": "PDF processed successfully with OCRFlux",
"processing_time": 45.67,
"data": {
"success": true,
"document_text": "# Document Title\n\nThis is the extracted markdown content...",
"metadata": {
"original_path": "/tmp/input.pdf",
"num_pages": 10,
"fallback_pages": [],
"processing_method": "OCRFlux-3B"
},
"pages": [
{
"page_number": 1,
"text": "# Page 1 Content\n\nMarkdown formatted content...",
"format": "markdown",
"text_length": 1524
}
],
"summary": {
"total_pages": 10,
"processed_pages": 10,
"fallback_pages": 0,
"total_text_length": 15240,
"processing_time": 45.67,
"has_fallbacks": false
}
}
}
Configure the service using environment variables in docker-compose.yml
:
Variable | Default | Description |
---|---|---|
MODEL_PATH |
/OCRFlux-3B |
Path to OCRFlux model in container |
HUGGINGFACE_MODEL_NAME |
ChatDOC/OCRFlux-3B |
HuggingFace model repository |
AUTO_DOWNLOAD_MODEL |
true |
Enable automatic model downloading |
MAX_PDF_SIZE |
104857600 |
Max PDF size in bytes (100MB) |
MAX_PAGES |
1000 |
Maximum pages to process |
MAX_PAGE_RETRIES |
3 |
Retry failed pages |
REQUEST_TIMEOUT |
600 |
Request timeout in seconds |
GPU_MEMORY_UTILIZATION |
0.8 |
GPU memory utilization for vLLM |
MAX_MODEL_LEN |
8192 |
Maximum model context length |
LOG_LEVEL |
INFO |
Logging level |
By default, the model is stored in a Docker-managed volume. For custom storage:
Option 1: Bind Mount (Custom Host Directory)
volumes:
# Replace Docker volume with bind mount
- /path/to/your/models:/OCRFlux-3B
Option 2: Keep Docker Volume (Recommended)
# Find the volume location
docker volume inspect ocrflux_model
# Access the volume data
docker run --rm -v ocrflux_model:/models alpine ls -la /models
The options
parameter in the API request supports:
max_page_retries
: Number of retries for failed pagesgpu_memory_utilization
: GPU memory usage (0.0-1.0)max_model_len
: Maximum context length
# Health check
curl -X GET http://localhost:8000/health
# Get model information
curl -X GET http://localhost:8000/model-info
# Process PDF
curl -X POST http://localhost:8000/process-pdf \
-H "Content-Type: application/json" \
-d '{
"pdf_url": "https://example.com/sample.pdf",
"options": {
"max_page_retries": 3,
"gpu_memory_utilization": 0.8
}
}'
The service automatically downloads the OCRFlux-3B model on first run:
- First Run: Downloads model from HuggingFace (~3-5GB)
- Subsequent Runs: Uses cached model from Docker volume
- Model Location: Stored in Docker-managed volume
- Download Methods:
- Primary: HuggingFace Hub library
- Fallback: Git clone from HuggingFace
To manually manage the model:
# Check model status
curl http://localhost:8000/model-info
# View volume contents
docker run --rm -v ocrflux_model:/models alpine ls -la /models
# Remove model cache (will re-download on next start)
docker volume rm ocrflux_model
# Start fresh
docker-compose down
docker volume rm ocrflux_model
docker-compose up --build
Check GPU usage and memory:
# Monitor GPU usage
nvidia-smi
# Watch GPU usage in real-time
watch -n 1 nvidia-smi
View service logs:
# View logs
docker-compose logs -f ocrflux
# Check for GPU-related issues
docker-compose logs ocrflux | grep -i gpu
# Check model download progress
docker-compose logs ocrflux | grep -i download