| --- |
| license: gpl-3.0 |
| language: |
| - en |
| tags: |
| - object-detection |
| - instance-segmentation |
| - yolov8 |
| - coco |
| - real-time |
| - pytorch |
| - capsule-network |
| - interpretable-ai |
| - symbolic-ai |
| library_name: ultralytics |
| pipeline_tag: image-segmentation |
| datasets: |
| - coco |
| model-index: |
| - name: SCN |
| results: |
| - task: |
| type: object-detection |
| name: Object Detection |
| dataset: |
| name: COCO 2017 |
| type: coco |
| split: val2017 |
| metrics: |
| - type: mAP50 |
| value: 0.57100 |
| name: mAP50 |
| - type: mAP50-95 |
| value: 0.41600 |
| name: mAP50:95 |
| - task: |
| type: instance-segmentation |
| name: Instance Segmentation |
| dataset: |
| name: COCO 2017 |
| type: coco |
| split: val2017 |
| metrics: |
| - type: mAP50 |
| value: 0.53316 |
| name: Mask mAP50 |
| - type: mAP50-95 |
| value: 0.34080 |
| name: Mask mAP50:95 |
| --- |
| |
| # Symbolic Capsule Network (SCN) |
|
|
| > *What if a detector could tell you not just **what** it found, but **why** it is confident?* |
|
|
| **SCN** is a real-time object detection and instance segmentation model that replaces the conventional convolutional head with a **capsule-based neck and head**. By encoding visual entities as pose-aware vectors rather than scalar activations, SCN explicitly captures *part-whole relationships* — the structural agreements between object parts and the wholes they compose. Every detection is backed by a **symbolic routing path**: a traceable chain of capsule agreements that exposes *which parts* voted for *which object*, turning each prediction into an auditable reasoning trace. |
|
|
| ## Live Demo |
|
|
| [Try the interactive demo ↗](https://huggingface.co/spaces/zpyuan/SymbolicCapsuleNetwork-demo) |
|
|
| ## Example Results |
|
|
| | | | |
| |---|---| |
| |  |  | |
| |  |  | |
|
|
| --- |
|
|
| ## Key Ideas |
|
|
| Standard convolutional detectors reduce every visual entity to a scalar confidence score, discarding the compositional structure that makes objects recognisable. SCN addresses this with three tightly integrated contributions: |
|
|
| **1. Part-Whole Relation Modelling** |
| `CapsRoute` layers propagate evidence upward from low-level part capsules — encoding local features such as wheels, windows, and body panels — to high-level object capsules through dynamic routing-by-agreement. Agreement is only reached when the geometric votes from multiple parts are mutually consistent, giving the model an inductive bias toward spatially coherent detections. |
|
|
| **2. Symbolic Routing Paths** |
| The routing coefficients produced at each capsule layer form an explicit, directed evidence graph. Unlike Grad-CAM or SHAP, which reconstruct explanations after the fact, SCN's routing weights are native model outputs — first-class signals that describe the model's reasoning as it happens, without any additional computation. |
|
|
| **3. Concept-Based Detection Auditing** |
| Routing paths enable structured inspection that scalar networks cannot support: |
| - **Verify** that a predicted "car" is grounded in consistent wheel, body, and windshield part activations. |
| - **Diagnose** which part capsule collapsed when the model misses an object under occlusion or viewpoint change. |
| - **Detect bias** by aggregating routing statistics across a dataset to reveal which visual parts the model over-relies on. |
|
|
| ## Architecture |
|
|
|  |
|
|
| The pipeline flows through four capsule-specific modules: |
|
|
| | Module | Role | |
| |---|---| |
| | `CapsProj` | Projects multi-scale CNN feature maps into capsule space | |
| | `CapsAlign` | Aligns capsule resolutions across FPN levels | |
| | `CapsRoute` / `CapsRouteV2-4` | Dynamic routing-by-agreement across part-to-whole levels | |
| | `CapsDecode` | Decodes final capsule activations into boxes and masks | |
|
|
| --- |
|
|
| ## Performance |
|
|
| ### Detection — COCO 2017 val |
|
|
| SCN sets a new state of the art among nano-scale detectors, surpassing every YOLO variant at comparable FLOPs. |
|
|
| | Model | mAP50 | mAP50:95 | mAP50 (E2E) | mAP50:95 (E2E) | Speed (ms) | Params (M) | FLOPs (B) | |
| |---|---:|---:|---:|---:|---:|---:|---:| |
| | YOLOv6n | 53.1% | 37.5% | 52.1% | 36.9% | 20.8 | 4.7 | 11.4 | |
| | YOLOv7-tiny | 56.7% | 38.7% | 55.7% | 38.1% | 20.9 | 6.2 | 13.8 | |
| | YOLOv8n | 52.5% | 37.3% | 51.5% | 36.6% | 18.3 | 3.2 | 8.7 | |
| | YOLOv9t | 53.1% | 38.3% | 52.1% | 37.6% | 20.1 | 2.0 | 7.7 | |
| | YOLOv10n | 53.8% | 38.5% | 52.8% | 37.8% | 16.7 | 2.3 | 6.7 | |
| | YOLOv11n | 55.1% | 39.5% | 54.1% | 38.8% | 19.3 | 2.6 | 6.5 | |
| | YOLOv12n | 56.7% | 40.4% | 55.7% | 39.7% | 19.4 | 2.5 | 6.0 | |
| | YOLO26n | 56.8% | 40.8% | 55.7% | 40.0% | 14.4 | 2.6 | 6.1 | |
| | **SCN-n (Ours)** | **57.1%** | **41.6%** | **56.1%** | **40.4%** | 29.6 | 3.3 | 6.5 | |
|
|
| SCN-n achieves **+0.3% mAP50 and +0.8% mAP50:95** over the previous best (YOLO26n) at the same 6.5B FLOPs budget — accuracy gains that come entirely from structural reasoning, not extra capacity. |
|
|
| ### Accuracy–Efficiency Frontier |
|
|
|  |
|
|
| *SCN occupies the top of the accuracy–efficiency frontier across all model scales (n / s / m / l / x). At every FLOPs level, SCN variants outperform their YOLO counterparts, demonstrating that part-whole routing is a principled and scalable improvement.* |
|
|
| ### Instance Segmentation — COCO 2017 val |
|
|
| | Model | Input | Mask mAP50 | Mask mAP50:95 | |
| |---|---:|---:|---:| |
| | SCN Segmentation | 640 | 53.3% | 34.1% | |
|
|
| --- |
|
|
| ## Quick Start |
|
|
| ```bash |
| pip install ultralytics huggingface_hub |
| ``` |
|
|
| ```python |
| from huggingface_hub import hf_hub_download |
| from ultralytics import YOLO |
| from models import register_ultralytics_modules |
| |
| weights = hf_hub_download( |
| repo_id="zpyuan/SymbolicCapsuleNetwork", |
| filename="weights/symbolic_capsule_network_segmentation.pt", |
| ) |
| register_ultralytics_modules() |
| model = YOLO(weights) |
| results = model.predict("image.jpg", imgsz=640, conf=0.25) |
| results[0].show() |
| ``` |
|
|
| Command-line: |
|
|
| ```bash |
| python predict.py path/to/image.jpg |
| python predict.py path/to/image.jpg --conf 0.3 --imgsz 1280 |
| ``` |
|
|
| --- |
|
|
| ## Repository Structure |
|
|
| | Path | Description | |
| |---|---| |
| | `weights/symbolic_capsule_network_segmentation.pt` | Pretrained segmentation checkpoint | |
| | `modules/` | Capsule modules: `CapsProj`, `CapsAlign`, `CapsRoute`, `CapsRouteV2-4`, `CapsDecode` | |
| | `models/custom_yolo.py` | Ultralytics hook that registers capsule layers before model load | |
| | `configs/seg_model/` | YAML defining the capsule neck and head architecture | |
| | `predict.py` | Minimal inference entry point | |
|
|
| --- |
|
|
|
|