Overview
At TabLogs, we needed a robust pipeline for extracting structured data from scanned logistics documents. The challenge: handling varied layouts, noisy scans, and multilingual text across thousands of daily shipments.
Architecture
The pipeline consists of three stages:
- Detection — CRAFT-based text detection locates text regions
- Recognition — A CRNN model reads the detected regions
- Structuring — Rule-based post-processing maps text to fields
from torchwisdom.models import CRAFTDetector, CRNNRecognizer
def extract_document(image_path: str) -> dict:
detector = CRAFTDetector.from_pretrained("craft-v2")
recognizer = CRNNRecognizer.from_pretrained("crnn-multilang")
regions = detector.detect(image_path)
texts = [recognizer.recognize(r) for r in regions]
return structure_fields(texts, regions)pythonONNX Export for Production
For production deployment, we export models to ONNX format. This gives us ~3x inference speedup with ONNX Runtime compared to native PyTorch.
python export_to_onnx.py --model craft-v2 --output models/craft.onnx
python export_to_onnx.py --model crnn-multilang --output models/crnn.onnxbashResults
The pipeline processes ~2,000 documents per hour on a single GPU instance, with 97.3% field extraction accuracy on our benchmark dataset. The ONNX deployment reduced inference costs by 60%.
Key Learnings
- Start with a strong detection model — recognition accuracy depends heavily on clean text region crops
- ONNX export is worth the effort for any production ML system
- Build evaluation metrics early — you can't improve what you don't measure