Zhipeng
update readme
14ae73d
---
license: gpl-3.0
language:
- en
tags:
- object-detection
- instance-segmentation
- yolov8
- coco
- real-time
- pytorch
- capsule-network
- interpretable-ai
- symbolic-ai
library_name: ultralytics
pipeline_tag: image-segmentation
datasets:
- coco
model-index:
- name: SCN
results:
- task:
type: object-detection
name: Object Detection
dataset:
name: COCO 2017
type: coco
split: val2017
metrics:
- type: mAP50
value: 0.57100
name: mAP50
- type: mAP50-95
value: 0.41600
name: mAP50:95
- task:
type: instance-segmentation
name: Instance Segmentation
dataset:
name: COCO 2017
type: coco
split: val2017
metrics:
- type: mAP50
value: 0.53316
name: Mask mAP50
- type: mAP50-95
value: 0.34080
name: Mask mAP50:95
---
# Symbolic Capsule Network (SCN)
> *What if a detector could tell you not just **what** it found, but **why** it is confident?*
**SCN** is a real-time object detection and instance segmentation model that replaces the conventional convolutional head with a **capsule-based neck and head**. By encoding visual entities as pose-aware vectors rather than scalar activations, SCN explicitly captures *part-whole relationships* — the structural agreements between object parts and the wholes they compose. Every detection is backed by a **symbolic routing path**: a traceable chain of capsule agreements that exposes *which parts* voted for *which object*, turning each prediction into an auditable reasoning trace.
## Live Demo
[Try the interactive demo ↗](https://huggingface.co/spaces/zpyuan/SymbolicCapsuleNetwork-demo)
## Example Results
| | |
|---|---|
| ![bus](figures/demo/bus.jpg) | ![people](figures/demo/people.jpg) |
| ![scene](figures/demo/coco_000000104101.jpg) | ![scene2](figures/demo/coco_000000023912.jpg) |
---
## Key Ideas
Standard convolutional detectors reduce every visual entity to a scalar confidence score, discarding the compositional structure that makes objects recognisable. SCN addresses this with three tightly integrated contributions:
**1. Part-Whole Relation Modelling**
`CapsRoute` layers propagate evidence upward from low-level part capsules — encoding local features such as wheels, windows, and body panels — to high-level object capsules through dynamic routing-by-agreement. Agreement is only reached when the geometric votes from multiple parts are mutually consistent, giving the model an inductive bias toward spatially coherent detections.
**2. Symbolic Routing Paths**
The routing coefficients produced at each capsule layer form an explicit, directed evidence graph. Unlike Grad-CAM or SHAP, which reconstruct explanations after the fact, SCN's routing weights are native model outputs — first-class signals that describe the model's reasoning as it happens, without any additional computation.
**3. Concept-Based Detection Auditing**
Routing paths enable structured inspection that scalar networks cannot support:
- **Verify** that a predicted "car" is grounded in consistent wheel, body, and windshield part activations.
- **Diagnose** which part capsule collapsed when the model misses an object under occlusion or viewpoint change.
- **Detect bias** by aggregating routing statistics across a dataset to reveal which visual parts the model over-relies on.
## Architecture
![Architecture Overview](figures/arch.png)
The pipeline flows through four capsule-specific modules:
| Module | Role |
|---|---|
| `CapsProj` | Projects multi-scale CNN feature maps into capsule space |
| `CapsAlign` | Aligns capsule resolutions across FPN levels |
| `CapsRoute` / `CapsRouteV2-4` | Dynamic routing-by-agreement across part-to-whole levels |
| `CapsDecode` | Decodes final capsule activations into boxes and masks |
---
## Performance
### Detection — COCO 2017 val
SCN sets a new state of the art among nano-scale detectors, surpassing every YOLO variant at comparable FLOPs.
| Model | mAP50 | mAP50:95 | mAP50 (E2E) | mAP50:95 (E2E) | Speed (ms) | Params (M) | FLOPs (B) |
|---|---:|---:|---:|---:|---:|---:|---:|
| YOLOv6n | 53.1% | 37.5% | 52.1% | 36.9% | 20.8 | 4.7 | 11.4 |
| YOLOv7-tiny | 56.7% | 38.7% | 55.7% | 38.1% | 20.9 | 6.2 | 13.8 |
| YOLOv8n | 52.5% | 37.3% | 51.5% | 36.6% | 18.3 | 3.2 | 8.7 |
| YOLOv9t | 53.1% | 38.3% | 52.1% | 37.6% | 20.1 | 2.0 | 7.7 |
| YOLOv10n | 53.8% | 38.5% | 52.8% | 37.8% | 16.7 | 2.3 | 6.7 |
| YOLOv11n | 55.1% | 39.5% | 54.1% | 38.8% | 19.3 | 2.6 | 6.5 |
| YOLOv12n | 56.7% | 40.4% | 55.7% | 39.7% | 19.4 | 2.5 | 6.0 |
| YOLO26n | 56.8% | 40.8% | 55.7% | 40.0% | 14.4 | 2.6 | 6.1 |
| **SCN-n (Ours)** | **57.1%** | **41.6%** | **56.1%** | **40.4%** | 29.6 | 3.3 | 6.5 |
SCN-n achieves **+0.3% mAP50 and +0.8% mAP50:95** over the previous best (YOLO26n) at the same 6.5B FLOPs budget — accuracy gains that come entirely from structural reasoning, not extra capacity.
### Accuracy–Efficiency Frontier
![COCO mAP50:95 vs FLOPs](figures/image.png)
*SCN occupies the top of the accuracy–efficiency frontier across all model scales (n / s / m / l / x). At every FLOPs level, SCN variants outperform their YOLO counterparts, demonstrating that part-whole routing is a principled and scalable improvement.*
### Instance Segmentation — COCO 2017 val
| Model | Input | Mask mAP50 | Mask mAP50:95 |
|---|---:|---:|---:|
| SCN Segmentation | 640 | 53.3% | 34.1% |
---
## Quick Start
```bash
pip install ultralytics huggingface_hub
```
```python
from huggingface_hub import hf_hub_download
from ultralytics import YOLO
from models import register_ultralytics_modules
weights = hf_hub_download(
repo_id="zpyuan/SymbolicCapsuleNetwork",
filename="weights/symbolic_capsule_network_segmentation.pt",
)
register_ultralytics_modules()
model = YOLO(weights)
results = model.predict("image.jpg", imgsz=640, conf=0.25)
results[0].show()
```
Command-line:
```bash
python predict.py path/to/image.jpg
python predict.py path/to/image.jpg --conf 0.3 --imgsz 1280
```
---
## Repository Structure
| Path | Description |
|---|---|
| `weights/symbolic_capsule_network_segmentation.pt` | Pretrained segmentation checkpoint |
| `modules/` | Capsule modules: `CapsProj`, `CapsAlign`, `CapsRoute`, `CapsRouteV2-4`, `CapsDecode` |
| `models/custom_yolo.py` | Ultralytics hook that registers capsule layers before model load |
| `configs/seg_model/` | YAML defining the capsule neck and head architecture |
| `predict.py` | Minimal inference entry point |
---