Handwriting Recognition & OCR
Collection
Handwriting recognition systems: CNN-BiLSTM-CTC trained on IAM (CER 12.95%), Azerbaijani document OCR dataset. β’ 2 items β’ Updated
A complete handwriting recognition system using a CNN-BiLSTM-CTC architecture trained on the IAM handwriting database. Achieves 12.95% CER and 42.47% WER on the IAM test set.
Hugging Face: IsmatS/handwriting_recognition
pip install torch torchvision pillow numpy huggingface_hub
import torch
import torch.nn as nn
import numpy as np
from PIL import Image
from huggingface_hub import hf_hub_download
# Download model checkpoint
ckpt_path = hf_hub_download(repo_id="IsmatS/handwriting-recognition-iam", filename="best_model.pth")
checkpoint = torch.load(ckpt_path, map_location="cpu")
# Character mapper from checkpoint
char_mapper = checkpoint['char_mapper']
def preprocess_image(image_path, target_height=64, target_width=256):
img = Image.open(image_path).convert('L') # grayscale
img = img.resize((target_width, target_height), Image.LANCZOS)
img = np.array(img, dtype=np.float32) / 255.0
img = (img - 0.5) / 0.5 # normalize to [-1, 1]
return torch.FloatTensor(img).unsqueeze(0).unsqueeze(0) # (1, 1, H, W)
def ctc_decode(predictions, char_mapper):
"""Greedy CTC decoding."""
pred_indices = predictions.argmax(dim=2).squeeze(1).tolist()
decoded = []
prev = None
for idx in pred_indices:
if idx != prev and idx != 0: # 0 = blank token
decoded.append(char_mapper.idx_to_char[idx])
prev = idx
return ''.join(decoded)
# Note: CRNN class must match the training definition in train_colab.ipynb
# See train_colab.ipynb for the full model class
# model = CRNN(num_chars=len(char_mapper.chars))
# model.load_state_dict(checkpoint['model_state_dict'])
# model.eval()
#
# img_tensor = preprocess_image("handwriting_sample.png")
# with torch.no_grad():
# output = model(img_tensor) # (T, 1, num_chars)
# text = ctc_decode(output, char_mapper)
# print("Recognized:", text)
Note: The trained model weights (
best_model.pth) are generated during training. Runtrain_colab.ipynbon Google Colab to produce the checkpoint, then use the code above for inference.
charts/ folderjupyter notebook analysis.ipynb
train_colab.ipynb to Google ColabTraining Time: ~1-2 hours for 20 epochs on T4 GPU
From analysis.ipynb:
charts/01_sample_images.png - 10 sample handwritten textscharts/02_text_length_distribution.png - Text statisticscharts/03_image_dimensions.png - Image analysischarts/04_character_frequency.png - Character distributioncharts/05_summary_statistics.png - Summary tableArchitecture:
Dataset: Teklia/IAM-line (Hugging Face)
Metrics:
After training in Colab:
best_model.pth - Trained model weightstraining_history.png - Loss/CER/WER plotspredictions.png - Sample predictionstorch>=2.0.0
datasets>=2.14.0
pillow>=9.5.0
numpy>=1.24.0
matplotlib>=3.7.0
seaborn>=0.13.0
jupyter>=1.0.0
jiwer>=3.0.0
import torch
# Load checkpoint
checkpoint = torch.load('best_model.pth')
char_mapper = checkpoint['char_mapper']
# Create model
from train_colab import CRNN # Copy model class
model = CRNN(num_chars=len(char_mapper.chars))
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()
# Predict
# ... (preprocessing + inference)
Dataset: IAM Database (research use)