Zhipeng

update readme

14ae73d 13 days ago

6.53 kB

	---
	license: gpl-3.0
	language:
	- en
	tags:
	- object-detection
	- instance-segmentation
	- yolov8
	- coco
	- real-time
	- pytorch
	- capsule-network
	- interpretable-ai
	- symbolic-ai
	library_name: ultralytics
	pipeline_tag: image-segmentation
	datasets:
	- coco
	model-index:
	- name: SCN
	results:
	- task:
	type: object-detection
	name: Object Detection
	dataset:
	name: COCO 2017
	type: coco
	split: val2017
	metrics:
	- type: mAP50
	value: 0.57100
	name: mAP50
	- type: mAP50-95
	value: 0.41600
	name: mAP50:95
	- task:
	type: instance-segmentation
	name: Instance Segmentation
	dataset:
	name: COCO 2017
	type: coco
	split: val2017
	metrics:
	- type: mAP50
	value: 0.53316
	name: Mask mAP50
	- type: mAP50-95
	value: 0.34080
	name: Mask mAP50:95
	---

	# Symbolic Capsule Network (SCN)

	> What if a detector could tell you not just what* it found, but why it is confident?*

	SCN is a real-time object detection and instance segmentation model that replaces the conventional convolutional head with a capsule-based neck and head. By encoding visual entities as pose-aware vectors rather than scalar activations, SCN explicitly captures part-whole relationships — the structural agreements between object parts and the wholes they compose. Every detection is backed by a symbolic routing path: a traceable chain of capsule agreements that exposes which parts voted for which object, turning each prediction into an auditable reasoning trace.

	## Live Demo

	[Try the interactive demo ↗](https://huggingface.co/spaces/zpyuan/SymbolicCapsuleNetwork-demo)

	## Example Results

	\| \| \|
	\|---\|---\|
	\| ![bus](figures/demo/bus.jpg) \| ![people](figures/demo/people.jpg) \|
	\| ![scene](figures/demo/coco_000000104101.jpg) \| ![scene2](figures/demo/coco_000000023912.jpg) \|

	---

	## Key Ideas

	Standard convolutional detectors reduce every visual entity to a scalar confidence score, discarding the compositional structure that makes objects recognisable. SCN addresses this with three tightly integrated contributions:

	1. Part-Whole Relation Modelling
	`CapsRoute` layers propagate evidence upward from low-level part capsules — encoding local features such as wheels, windows, and body panels — to high-level object capsules through dynamic routing-by-agreement. Agreement is only reached when the geometric votes from multiple parts are mutually consistent, giving the model an inductive bias toward spatially coherent detections.

	2. Symbolic Routing Paths
	The routing coefficients produced at each capsule layer form an explicit, directed evidence graph. Unlike Grad-CAM or SHAP, which reconstruct explanations after the fact, SCN's routing weights are native model outputs — first-class signals that describe the model's reasoning as it happens, without any additional computation.

	3. Concept-Based Detection Auditing
	Routing paths enable structured inspection that scalar networks cannot support:
	- Verify that a predicted "car" is grounded in consistent wheel, body, and windshield part activations.
	- Diagnose which part capsule collapsed when the model misses an object under occlusion or viewpoint change.
	- Detect bias by aggregating routing statistics across a dataset to reveal which visual parts the model over-relies on.

	## Architecture

	![Architecture Overview](figures/arch.png)

	The pipeline flows through four capsule-specific modules:

	\| Module \| Role \|
	\|---\|---\|
	\| `CapsProj` \| Projects multi-scale CNN feature maps into capsule space \|
	\| `CapsAlign` \| Aligns capsule resolutions across FPN levels \|
	\| `CapsRoute` / `CapsRouteV2-4` \| Dynamic routing-by-agreement across part-to-whole levels \|
	\| `CapsDecode` \| Decodes final capsule activations into boxes and masks \|

	---

	## Performance

	### Detection — COCO 2017 val

	SCN sets a new state of the art among nano-scale detectors, surpassing every YOLO variant at comparable FLOPs.

	\| Model \| mAP50 \| mAP50:95 \| mAP50 (E2E) \| mAP50:95 (E2E) \| Speed (ms) \| Params (M) \| FLOPs (B) \|
	\|---\|---:\|---:\|---:\|---:\|---:\|---:\|---:\|
	\| YOLOv6n \| 53.1% \| 37.5% \| 52.1% \| 36.9% \| 20.8 \| 4.7 \| 11.4 \|
	\| YOLOv7-tiny \| 56.7% \| 38.7% \| 55.7% \| 38.1% \| 20.9 \| 6.2 \| 13.8 \|
	\| YOLOv8n \| 52.5% \| 37.3% \| 51.5% \| 36.6% \| 18.3 \| 3.2 \| 8.7 \|
	\| YOLOv9t \| 53.1% \| 38.3% \| 52.1% \| 37.6% \| 20.1 \| 2.0 \| 7.7 \|
	\| YOLOv10n \| 53.8% \| 38.5% \| 52.8% \| 37.8% \| 16.7 \| 2.3 \| 6.7 \|
	\| YOLOv11n \| 55.1% \| 39.5% \| 54.1% \| 38.8% \| 19.3 \| 2.6 \| 6.5 \|
	\| YOLOv12n \| 56.7% \| 40.4% \| 55.7% \| 39.7% \| 19.4 \| 2.5 \| 6.0 \|
	\| YOLO26n \| 56.8% \| 40.8% \| 55.7% \| 40.0% \| 14.4 \| 2.6 \| 6.1 \|
	\| SCN-n (Ours) \| 57.1% \| 41.6% \| 56.1% \| 40.4% \| 29.6 \| 3.3 \| 6.5 \|

	SCN-n achieves +0.3% mAP50 and +0.8% mAP50:95 over the previous best (YOLO26n) at the same 6.5B FLOPs budget — accuracy gains that come entirely from structural reasoning, not extra capacity.

	### Accuracy–Efficiency Frontier

	![COCO mAP50:95 vs FLOPs](figures/image.png)

	SCN occupies the top of the accuracy–efficiency frontier across all model scales (n / s / m / l / x). At every FLOPs level, SCN variants outperform their YOLO counterparts, demonstrating that part-whole routing is a principled and scalable improvement.

	### Instance Segmentation — COCO 2017 val

	\| Model \| Input \| Mask mAP50 \| Mask mAP50:95 \|
	\|---\|---:\|---:\|---:\|
	\| SCN Segmentation \| 640 \| 53.3% \| 34.1% \|

	---

	## Quick Start

	```bash
	pip install ultralytics huggingface_hub
	```

	```python
	from huggingface_hub import hf_hub_download
	from ultralytics import YOLO
	from models import register_ultralytics_modules

	weights = hf_hub_download(
	repo_id="zpyuan/SymbolicCapsuleNetwork",
	filename="weights/symbolic_capsule_network_segmentation.pt",
	)
	register_ultralytics_modules()
	model = YOLO(weights)
	results = model.predict("image.jpg", imgsz=640, conf=0.25)
	results[0].show()
	```

	Command-line:

	```bash
	python predict.py path/to/image.jpg
	python predict.py path/to/image.jpg --conf 0.3 --imgsz 1280
	```

	---

	## Repository Structure

	\| Path \| Description \|
	\|---\|---\|
	\| `weights/symbolic_capsule_network_segmentation.pt` \| Pretrained segmentation checkpoint \|
	\| `modules/` \| Capsule modules: `CapsProj`, `CapsAlign`, `CapsRoute`, `CapsRouteV2-4`, `CapsDecode` \|
	\| `models/custom_yolo.py` \| Ultralytics hook that registers capsule layers before model load \|
	\| `configs/seg_model/` \| YAML defining the capsule neck and head architecture \|
	\| `predict.py` \| Minimal inference entry point \|

	---