This project focuses on automatically extracting key information from Vietnamese book cover images using computer vision and OCR techniques. The goal is to streamline the process of cataloging books by replacing manual data entry with an automated pipeline.
- Input: Book cover images taken manually with mobile devices.
- Output: Extracted metadata including Title, Author, Publisher, and Other content.
- Key Tasks:
- Text Detection – Detect text regions using YOLOv8.
- Text Recognition – Extract text content using a fine-tuned VietOCR model.
- Information Synthesis – Organize recognized text into structured fields.
- Total images: 964 book covers (manually collected).
- Devices used: iPhone, Samsung, iPad (to ensure variation).
- Labeling tool: PPOCRLabel
- Annotation types: Line-level bounding boxes with labels (title, author, publisher, other).
- Split: Train (700), Validation (132), Test (132)
- YOLOv8 – Object detection for text regions.
- VietOCR – OCR tailored for Vietnamese language.
- Python, OpenCV, PyTorch
- PPOCRLabel – For text region annotation
Book Cover Image
↓
[YOLOv8]
Text Region Detection
↓
[VietOCR]
Text Recognition
↓
Information Synthesis
Structured Metadata (Title, Author, Publisher, Other)
- TS. Lê Đình Duy
- ThS. Phạm Nguyễn Trường An
- Bùi Lê Trọng Đức
- Nguyễn Tiến Thịnh
- Nguyễn Chí Thi