lemon-manga-translator Models

KR | 한국어

lemon-manga-translator 파이프라인에서 사용하는 모델 가중치.

일본어 만화 페이지에서 텍스트를 검출하고, 원문을 지운 뒤, 한국어 번역을 렌더하는 파이프라인의 로컬 추론 모델 2종을 PyTorch(safetensors)와 ONNX 형태로 제공한다.

구조

pytorch/
├── comic-text-detector/
│   ├── config.json           # YOLOv5 아키텍처 설정
│   └── model.safetensors     # blk_det + text_seg 가중치
└── aot-inpainting/
    ├── config.json           # AOT-GAN 하이퍼파라미터
    └── model.safetensors     # AOT generator 가중치

onnx/
├── comic-text-detector/
│   └── ctd.onnx              # FP32, 1024×1024 고정, opset 17
└── aot-inpainting/
    └── aot_folded.onnx       # FP32, H/W 동적, opset 17, ScaledWS folded

모델 상세

Comic Text Detector (CTD)

YOLO bbox 검출 + UNet 픽셀 마스크를 동시에 출력하는 텍스트 검출 모델.

항목	값
아키텍처	YOLOv5s backbone + UNet mask head
입력	1024×1024 RGB (letterbox)
출력	bbox predictions (N×6) + mask (1×1×1024×1024)
PyTorch	63 MB (safetensors)
ONNX	77 MB

AOT Inpainting

마스크된 텍스트 영역을 자연스러운 배경으로 채우는 인페인팅 모델.

항목	값
아키텍처	AOT-GAN (Aggregated Contextual Transformations)
입력	image (1×3×H×W) + mask (1×1×H×W), H/W는 8의 배수
출력	inpainted (1×3×H×W), 값 범위 [-1, 1]
PyTorch	22 MB (safetensors)
ONNX	23 MB

용도	CTD	AOT	비고
배포 (pip install)	`onnx/ctd.onnx`	`onnx/aot_folded.onnx`	torch 불필요, onnxruntime만
연구 / 파인튜닝	`pytorch/comic-text-detector/`	`pytorch/aot-inpainting/`	PyTorch 필요

EN | English

Model weights for the lemon-manga-translator pipeline.

Two local inference models for detecting text in Japanese manga pages, inpainting the original text, and rendering Korean translations. Available in both PyTorch (safetensors) and ONNX formats.

Structure

pytorch/
├── comic-text-detector/
│   ├── config.json           # YOLOv5 architecture config
│   └── model.safetensors     # blk_det + text_seg weights
└── aot-inpainting/
    ├── config.json           # AOT-GAN hyperparameters
    └── model.safetensors     # AOT generator weights

onnx/
├── comic-text-detector/
│   └── ctd.onnx              # FP32, fixed 1024×1024, opset 17
└── aot-inpainting/
    └── aot_folded.onnx       # FP32, dynamic H/W, opset 17, ScaledWS folded

Model Details

Comic Text Detector (CTD)

Text detection model that outputs YOLO bboxes and UNet pixel masks simultaneously.

Field	Value
Architecture	YOLOv5s backbone + UNet mask head
Input	1024×1024 RGB (letterbox)
Output	bbox predictions (N×6) + mask (1×1×1024×1024)
PyTorch	63 MB (safetensors)
ONNX	77 MB

AOT Inpainting

Inpainting model that fills masked text regions with natural background.

Field	Value
Architecture	AOT-GAN (Aggregated Contextual Transformations)
Input	image (1×3×H×W) + mask (1×1×H×W), H/W must be multiple of 8
Output	inpainted (1×3×H×W), value range [-1, 1]
PyTorch	22 MB (safetensors)
ONNX	23 MB

Recommended Configuration

Use Case	CTD	AOT	Notes
Deployment (pip install)	`onnx/ctd.onnx`	`onnx/aot_folded.onnx`	No torch needed, onnxruntime only
Research / Fine-tuning	`pytorch/comic-text-detector/`	`pytorch/aot-inpainting/`	Requires PyTorch

JP | 日本語

lemon-manga-translator パイプラインで使用するモデルの重み。

日本語漫画ページからテキストを検出し、原文を消去した後、韓国語翻訳をレンダリングするパイプラインのローカル推論モデル2種を、PyTorch（safetensors）とONNX形式で提供する。

構造

pytorch/
├── comic-text-detector/
│   ├── config.json           # YOLOv5 アーキテクチャ設定
│   └── model.safetensors     # blk_det + text_seg 重み
└── aot-inpainting/
    ├── config.json           # AOT-GAN ハイパーパラメータ
    └── model.safetensors     # AOT generator 重み

onnx/
├── comic-text-detector/
│   └── ctd.onnx              # FP32、1024×1024固定、opset 17
└── aot-inpainting/
    └── aot_folded.onnx       # FP32、H/W動的、opset 17、ScaledWS folded

モデル詳細

Comic Text Detector (CTD)

YOLO bboxとUNetピクセルマスクを同時に出力するテキスト検出モデル。

項目	値
アーキテクチャ	YOLOv5s backbone + UNet mask head
入力	1024×1024 RGB（letterbox）
出力	bbox predictions (N×6) + mask (1×1×1024×1024)
PyTorch	63 MB（safetensors）
ONNX	77 MB

AOT Inpainting

マスクされたテキスト領域を自然な背景で埋めるインペインティングモデル。

項目	値
アーキテクチャ	AOT-GAN (Aggregated Contextual Transformations)
入力	image (1×3×H×W) + mask (1×1×H×W)、H/Wは8の倍数
出力	inpainted (1×3×H×W)、値範囲 [-1, 1]
PyTorch	22 MB（safetensors）
ONNX	23 MB

推奨構成

用途	CTD	AOT	備考
デプロイ（pip install）	`onnx/ctd.onnx`	`onnx/aot_folded.onnx`	torch不要、onnxruntimeのみ
研究 / ファインチューニング	`pytorch/comic-text-detector/`	`pytorch/aot-inpainting/`	PyTorch必要

License

이 레포의 모든 모델은 GPL-3.0 라이선스를 따른다.

All models in this repository are licensed under GPL-3.0.

Model	Source	License
Comic Text Detector	Original author dmMaze/comic-text-detector · Distributed via manga-image-translator beta-0.3	GPL-3.0
AOT Inpainting	Architecture AOT-GAN (Zeng et al., 2021) · Fine-tuned by manga-image-translator beta-0.3 · Refactored by mayocream/aot-inpainting	GPL-3.0

ONNX Conversion Notes

KR | 한국어

ONNX 변환 및 최적화는 LemonDouble/lemon-manga-translator에서 수행했다.

CTD (`ctd.onnx`)

torch.onnx.export로 TextDetBase (YOLOv5 + UNet) 변환.
NMS는 모델 밖에서 후처리로 수행 (numpy 구현).
PyTorch 대비 실제 이미지 기준 bbox diff ≤ 1.9e-3 px, mask diff ≤ 1.5e-5.

AOT (`aot_folded.onnx`)

ScaledWSConv2d의 가중치 정규화를 fold — eval 모드에서 get_weight()의 결과(var_mean → rsqrt → scale)를 한 번 계산해 weight에 bake-in.
ONNX 그래프에서 var_mean/rsqrt/mul/sub 노드 수백 개 제거. 파일 크기 동일, 추론 시 불필요 연산 제거.
원본 대비 max diff 5.7e-4 (parity 완벽).
H/W 동적 axes 지원 (8의 배수).

INT8 양자화 시도

두 모델 모두 INT8 dynamic quantization을 시도했으나 채택하지 않음:

CTD: NMS 검출 수 불일치 (LeakyReLU의 Mul 패턴으로 type inference 불완전).
AOT: PSNR 16dB로 완전 파괴 (GatedConv × sigmoid × 1.8 + custom LayerNorm × 5의 곱셈 체인이 양자화 오차를 지수 증폭).

Parity 검증 결과

비교	결과
PyTorch vs ONNX FP32 (CTD mask)	99.996% 픽셀 일치
PyTorch vs ONNX FP32 (AOT inpaint)	99.934% 픽셀 일치, max diff 47/255
PyTorch vs ONNX FP32 (최종 이미지)	99.2% 일치 (bubble fill 경계 노이즈)
torch NMS vs numpy NMS	100% 바이트 동일
AOT original vs AOT folded	max diff 5.7e-4 (완벽)

벤치마크 (AMD Ryzen 9 3950X, CPU, 1397×1969 이미지)

	PyTorch	ONNX 최종	변화
CTD (검출)	784 ms	764 ms	−3%
AOT (인페인팅)	2456 ms	2722 ms	+11%
Pipeline 전체	~4015 ms	3566 ms	−11%
Peak RSS	1481 MB	1065 MB	−29%
모듈 import RSS	~530 MB	~30 MB	−94%
설치 크기 (런타임)	~200 MB	~20 MB	−90%

측정 조건: warmup 1회 + 측정 3회 median. enable_cpu_mem_arena=False (CLI 배포 기본값).

EN | English

ONNX conversion and optimization were performed in LemonDouble/lemon-manga-translator.

CTD (`ctd.onnx`)

Converted TextDetBase (YOLOv5 + UNet) via torch.onnx.export.
NMS is performed outside the model as post-processing (numpy implementation).
Compared to PyTorch on real images: bbox diff ≤ 1.9e-3 px, mask diff ≤ 1.5e-5.

AOT (`aot_folded.onnx`)

Folded ScaledWSConv2d weight standardization — computed get_weight() result (var_mean → rsqrt → scale) once in eval mode and baked it into weights.
Removed hundreds of var_mean/rsqrt/mul/sub nodes from the ONNX graph. Same file size, fewer ops at inference.
Max diff vs original: 5.7e-4 (perfect parity).
Dynamic H/W axes supported (must be multiple of 8).

INT8 Quantization Attempts

INT8 dynamic quantization was attempted on both models but not adopted:

CTD: NMS detection count mismatch (type inference incomplete due to LeakyReLU Mul pattern).
AOT: PSNR 16dB, completely destroyed (GatedConv × sigmoid × 1.8 + custom LayerNorm × 5 multiplicative chain exponentially amplifies quantization error).

Parity Verification

Comparison	Result
PyTorch vs ONNX FP32 (CTD mask)	99.996% pixel match
PyTorch vs ONNX FP32 (AOT inpaint)	99.934% pixel match, max diff 47/255
PyTorch vs ONNX FP32 (final image)	99.2% match (bubble fill boundary noise)
torch NMS vs numpy NMS	100% byte-identical
AOT original vs AOT folded	max diff 5.7e-4 (perfect)

Benchmark (AMD Ryzen 9 3950X, CPU, 1397×1969 image)

	PyTorch	ONNX Final	Change
CTD (detection)	784 ms	764 ms	−3%
AOT (inpainting)	2456 ms	2722 ms	+11%
Full pipeline	~4015 ms	3566 ms	−11%
Peak RSS	1481 MB	1065 MB	−29%
Module import RSS	~530 MB	~30 MB	−94%
Install size (runtime)	~200 MB	~20 MB	−90%

Measured with warmup 1 + 3 iterations median. enable_cpu_mem_arena=False (CLI deployment default).

JP | 日本語

ONNX変換および最適化は LemonDouble/lemon-manga-translator にて実施した。

CTD (`ctd.onnx`)

torch.onnx.export で TextDetBase（YOLOv5 + UNet）を変換。
NMSはモデル外で後処理として実行（numpy実装）。
PyTorch比、実画像基準で bbox diff ≤ 1.9e-3 px、mask diff ≤ 1.5e-5。

AOT (`aot_folded.onnx`)

ScaledWSConv2d の重み標準化をfold — evalモードで get_weight() の結果（var_mean → rsqrt → scale）を一度計算し、weightにbake-in。
ONNXグラフから var_mean/rsqrt/mul/sub ノード数百個を除去。ファイルサイズ同一、推論時の不要な演算を削減。
元モデル比 max diff 5.7e-4（完全一致）。
H/W動的axes対応（8の倍数）。

INT8量子化の試み

両モデルともINT8動的量子化を試みたが採用せず：

CTD: NMS検出数不一致（LeakyReLUのMulパターンによりtype inference不完全）。
AOT: PSNR 16dBで完全に崩壊（GatedConv × sigmoid × 1.8 + custom LayerNorm × 5 の乗算チェーンが量子化誤差を指数的に増幅）。

Parity検証結果

比較	結果
PyTorch vs ONNX FP32 (CTD mask)	99.996% ピクセル一致
PyTorch vs ONNX FP32 (AOT inpaint)	99.934% ピクセル一致、max diff 47/255
PyTorch vs ONNX FP32 (最終画像)	99.2% 一致（bubble fill境界ノイズ）
torch NMS vs numpy NMS	100% バイト同一
AOT original vs AOT folded	max diff 5.7e-4 (完全)

ベンチマーク（AMD Ryzen 9 3950X、CPU、1397×1969画像）

	PyTorch	ONNX最終	変化
CTD（検出）	784 ms	764 ms	−3%
AOT（インペインティング）	2456 ms	2722 ms	+11%
パイプライン全体	~4015 ms	3566 ms	−11%
Peak RSS	1481 MB	1065 MB	−29%
モジュールimport RSS	~530 MB	~30 MB	−94%
インストールサイズ（ランタイム）	~200 MB	~20 MB	−90%

測定条件: warmup 1回 + 3回測定 median。enable_cpu_mem_arena=False（CLIデプロイデフォルト）。

Downloads last month: -; Downloads are not tracked for this model. How to track

Paper for lemondouble/lemon-manga-translator

Aggregated Contextual Transformations for High-Resolution Image Inpainting

Paper • 2104.01431 • Published Apr 3, 2021

lemon-manga-translator Models

KR | 한국어

구조

모델 상세

Comic Text Detector (CTD)

AOT Inpainting

추천 구성

EN | English

Structure

Model Details

Comic Text Detector (CTD)

AOT Inpainting

Recommended Configuration

JP | 日本語

構造

モデル詳細

Comic Text Detector (CTD)

AOT Inpainting

推奨構成

License

ONNX Conversion Notes

KR | 한국어

CTD (ctd.onnx)

AOT (aot_folded.onnx)

INT8 양자화 시도

Parity 검증 결과

벤치마크 (AMD Ryzen 9 3950X, CPU, 1397×1969 이미지)

EN | English

CTD (ctd.onnx)

AOT (aot_folded.onnx)

INT8 Quantization Attempts

Parity Verification

Benchmark (AMD Ryzen 9 3950X, CPU, 1397×1969 image)

JP | 日本語

CTD (ctd.onnx)

AOT (aot_folded.onnx)

INT8量子化の試み

Parity検証結果

ベンチマーク（AMD Ryzen 9 3950X、CPU、1397×1969画像）

Paper for lemondouble/lemon-manga-translator

CTD (`ctd.onnx`)

AOT (`aot_folded.onnx`)

CTD (`ctd.onnx`)

AOT (`aot_folded.onnx`)

CTD (`ctd.onnx`)

AOT (`aot_folded.onnx`)