--- license: gpl-3.0 language: - en tags: - object-detection - instance-segmentation - yolov8 - coco - real-time - pytorch - capsule-network - interpretable-ai - symbolic-ai library_name: ultralytics pipeline_tag: image-segmentation datasets: - coco model-index: - name: SCN results: - task: type: object-detection name: Object Detection dataset: name: COCO 2017 type: coco split: val2017 metrics: - type: mAP50 value: 0.57100 name: mAP50 - type: mAP50-95 value: 0.41600 name: mAP50:95 - task: type: instance-segmentation name: Instance Segmentation dataset: name: COCO 2017 type: coco split: val2017 metrics: - type: mAP50 value: 0.53316 name: Mask mAP50 - type: mAP50-95 value: 0.34080 name: Mask mAP50:95 --- # Symbolic Capsule Network (SCN) > *What if a detector could tell you not just **what** it found, but **why** it is confident?* **SCN** is a real-time object detection and instance segmentation model that replaces the conventional convolutional head with a **capsule-based neck and head**. By encoding visual entities as pose-aware vectors rather than scalar activations, SCN explicitly captures *part-whole relationships* — the structural agreements between object parts and the wholes they compose. Every detection is backed by a **symbolic routing path**: a traceable chain of capsule agreements that exposes *which parts* voted for *which object*, turning each prediction into an auditable reasoning trace. ## Live Demo [Try the interactive demo ↗](https://huggingface.co/spaces/zpyuan/SymbolicCapsuleNetwork-demo) ## Example Results | | | |---|---| | ![bus](figures/demo/bus.jpg) | ![people](figures/demo/people.jpg) | | ![scene](figures/demo/coco_000000104101.jpg) | ![scene2](figures/demo/coco_000000023912.jpg) | --- ## Key Ideas Standard convolutional detectors reduce every visual entity to a scalar confidence score, discarding the compositional structure that makes objects recognisable. SCN addresses this with three tightly integrated contributions: **1. Part-Whole Relation Modelling** `CapsRoute` layers propagate evidence upward from low-level part capsules — encoding local features such as wheels, windows, and body panels — to high-level object capsules through dynamic routing-by-agreement. Agreement is only reached when the geometric votes from multiple parts are mutually consistent, giving the model an inductive bias toward spatially coherent detections. **2. Symbolic Routing Paths** The routing coefficients produced at each capsule layer form an explicit, directed evidence graph. Unlike Grad-CAM or SHAP, which reconstruct explanations after the fact, SCN's routing weights are native model outputs — first-class signals that describe the model's reasoning as it happens, without any additional computation. **3. Concept-Based Detection Auditing** Routing paths enable structured inspection that scalar networks cannot support: - **Verify** that a predicted "car" is grounded in consistent wheel, body, and windshield part activations. - **Diagnose** which part capsule collapsed when the model misses an object under occlusion or viewpoint change. - **Detect bias** by aggregating routing statistics across a dataset to reveal which visual parts the model over-relies on. ## Architecture ![Architecture Overview](figures/arch.png) The pipeline flows through four capsule-specific modules: | Module | Role | |---|---| | `CapsProj` | Projects multi-scale CNN feature maps into capsule space | | `CapsAlign` | Aligns capsule resolutions across FPN levels | | `CapsRoute` / `CapsRouteV2-4` | Dynamic routing-by-agreement across part-to-whole levels | | `CapsDecode` | Decodes final capsule activations into boxes and masks | --- ## Performance ### Detection — COCO 2017 val SCN sets a new state of the art among nano-scale detectors, surpassing every YOLO variant at comparable FLOPs. | Model | mAP50 | mAP50:95 | mAP50 (E2E) | mAP50:95 (E2E) | Speed (ms) | Params (M) | FLOPs (B) | |---|---:|---:|---:|---:|---:|---:|---:| | YOLOv6n | 53.1% | 37.5% | 52.1% | 36.9% | 20.8 | 4.7 | 11.4 | | YOLOv7-tiny | 56.7% | 38.7% | 55.7% | 38.1% | 20.9 | 6.2 | 13.8 | | YOLOv8n | 52.5% | 37.3% | 51.5% | 36.6% | 18.3 | 3.2 | 8.7 | | YOLOv9t | 53.1% | 38.3% | 52.1% | 37.6% | 20.1 | 2.0 | 7.7 | | YOLOv10n | 53.8% | 38.5% | 52.8% | 37.8% | 16.7 | 2.3 | 6.7 | | YOLOv11n | 55.1% | 39.5% | 54.1% | 38.8% | 19.3 | 2.6 | 6.5 | | YOLOv12n | 56.7% | 40.4% | 55.7% | 39.7% | 19.4 | 2.5 | 6.0 | | YOLO26n | 56.8% | 40.8% | 55.7% | 40.0% | 14.4 | 2.6 | 6.1 | | **SCN-n (Ours)** | **57.1%** | **41.6%** | **56.1%** | **40.4%** | 29.6 | 3.3 | 6.5 | SCN-n achieves **+0.3% mAP50 and +0.8% mAP50:95** over the previous best (YOLO26n) at the same 6.5B FLOPs budget — accuracy gains that come entirely from structural reasoning, not extra capacity. ### Accuracy–Efficiency Frontier ![COCO mAP50:95 vs FLOPs](figures/image.png) *SCN occupies the top of the accuracy–efficiency frontier across all model scales (n / s / m / l / x). At every FLOPs level, SCN variants outperform their YOLO counterparts, demonstrating that part-whole routing is a principled and scalable improvement.* ### Instance Segmentation — COCO 2017 val | Model | Input | Mask mAP50 | Mask mAP50:95 | |---|---:|---:|---:| | SCN Segmentation | 640 | 53.3% | 34.1% | --- ## Quick Start ```bash pip install ultralytics huggingface_hub ``` ```python from huggingface_hub import hf_hub_download from ultralytics import YOLO from models import register_ultralytics_modules weights = hf_hub_download( repo_id="zpyuan/SymbolicCapsuleNetwork", filename="weights/symbolic_capsule_network_segmentation.pt", ) register_ultralytics_modules() model = YOLO(weights) results = model.predict("image.jpg", imgsz=640, conf=0.25) results[0].show() ``` Command-line: ```bash python predict.py path/to/image.jpg python predict.py path/to/image.jpg --conf 0.3 --imgsz 1280 ``` --- ## Repository Structure | Path | Description | |---|---| | `weights/symbolic_capsule_network_segmentation.pt` | Pretrained segmentation checkpoint | | `modules/` | Capsule modules: `CapsProj`, `CapsAlign`, `CapsRoute`, `CapsRouteV2-4`, `CapsDecode` | | `models/custom_yolo.py` | Ultralytics hook that registers capsule layers before model load | | `configs/seg_model/` | YAML defining the capsule neck and head architecture | | `predict.py` | Minimal inference entry point | ---