HearthNet / docs /ARCHITECTURE.md
GitHub Actions
fix: llm.chat IndexError (lazy Ollama warm + safe _resolve_backend fallback) + chat self-send returns direct
e3c922c

HearthNet β€” Architecture Reference

Local-first community AI mesh. Each participant runs a node on their own hardware. Nodes discover each other automatically and share AI capabilities, files, and community posts β€” no central server required.


High-Level Concept

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         Community Mesh (LAN / overlay)                    β”‚
β”‚                                                                           β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    mDNS/UDP     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    mDNS/UDP            β”‚
β”‚   β”‚  Node A     │◄───────────────►│  Node B     │◄──────────────         β”‚
β”‚   β”‚  (anchor)   β”‚                 β”‚  (hearth)   β”‚                         β”‚
β”‚   β”‚             β”‚   capability    β”‚             β”‚                         β”‚
β”‚   β”‚  CapBus ◄───┼─────bus.call───►─►  CapBus   β”‚                         β”‚
β”‚   β”‚  LLM svc    β”‚                 β”‚  RAG svc    β”‚                         β”‚
β”‚   β”‚  RAG svc    β”‚                 β”‚  OCR svc    β”‚                         β”‚
β”‚   β”‚  Gradio UI  β”‚                 β”‚  Gradio UI  β”‚                         β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

HearthNet is structured around three ideas:

  1. Node β€” a Python process on someone's hardware (Raspberry Pi, laptop, server).
  2. CapabilityBus β€” a message bus where services register capabilities (e.g. llm.chat@1.0). Any code, local or remote, calls a capability by name.
  3. Services β€” pure-Python objects that handle capability calls. A node installs whichever services its hardware supports.

Module Map

Phase 1 β€” Foundation

Module Location What it does
M01 Identity hearthnet/identity/ Ed25519 node keys, community manifests, invite tokens
M02 Discovery hearthnet/discovery/ mDNS + UDP multicast peer discovery
M03 Bus hearthnet/bus/ Capability router, health ring buffer, trust levels
M04 LLM hearthnet/services/llm/ Local model backends (Ollama, llama.cpp, LM Studio, HF, Anthropic)
M05 RAG hearthnet/services/rag/ Chunker β†’ embedder β†’ Chroma vector store + retrieval
M06 Marketplace hearthnet/services/marketplace/ Event-sourced community board (posts, offers, requests)
M07 Blobs hearthnet/blobs/ BLAKE3 content-addressed file store with chunked transfer
M08 UI hearthnet/ui/ Gradio 8-tab interface + themes + topology component
M09 Emergency hearthnet/emergency/ Async probe loop β†’ emergency state machine
M10 Chat hearthnet/services/chat/ Event-backed direct messages between nodes
M11 Embedding hearthnet/services/embedding/ Sentence-transformer embeddings (BAAI/bge-small)
M12 CLI hearthnet/cli.py Click CLI: run, call, log, rag, invite, version, …
M13 Onboarding hearthnet/ui/onboarding.py Invite QR flow + first-run wizard

Phase 2 β€” Resilience & Rich Services

Module Location What it does
M14 Federation hearthnet/federation/ Cross-community node manifests + signed bridges
M15 Relay hearthnet/relay/ Public-IP relay tier for NAT traversal
M16 Tokens hearthnet/identity/tokens.py AuthToken / CapabilityToken scoped access
M17 OCR hearthnet/services/ocr/ Tesseract / TrOCR text extraction
M18 Translation hearthnet/services/translation/ NLLB-200 local translation
M19 STT/TTS hearthnet/services/stt_tts/ Whisper STT + Coqui/pyttsx3 TTS
M20 Vision hearthnet/services/vision/ Florence-2 image captioning / VQA
M21 Tool Calls hearthnet/services/tools/ LLM tool-call executor (plant ID, search, …)
M22 Mobile hearthnet/ui/mobile/ PWA manifest + service worker for home-screen install
M23 E2E Encryption hearthnet/crypto/ X25519 ECDH + ChaCha20-Poly1305 channel encryption
M24 Rerank hearthnet/services/rerank/ Cross-encoder reranking for RAG results
M25 Group Chat hearthnet/services/group_chat/ Multi-party room-based chat

Phase 3 β€” Experimental (opt-in via config.toml)

Module Location Flag What it does
M26 Distributed Inference hearthnet/distributed_inference/ research.distributed_inference Layer-shard a 7B model across LAN nodes (Petals-style)
M27 MoE Routing hearthnet/moe/ research.moe_routing Route queries to best expert (model/service/human) via learned scorer
M28 FedLearn hearthnet/fedlearn/ research.fedlearn FedAvg LoRA fine-tuning without sharing raw data
M29 LoRa Beacons hearthnet/lora/ research.lora_beacons 868 MHz offline "I'm alive" heartbeats via USB LoRa stick
M30 Evidence Graph hearthnet/evidence/ research.evidence Claim β†’ attest β†’ dispute provenance graph + EBKH bridge
M31 Civil Defense hearthnet/civdef/ research.civil_defense THW/DRK/KatS alert pipeline with role certs + audit chain
M32 Protocol Standard hearthnet/services/protocol/ on by default Protocol version list + conformance report

Cross-Cutting

ID Location What it does
X01 Transport hearthnet/transport/ HTTP/SSE client, backpressure, rate limiting, frame types
X02 Events hearthnet/events/ SQLite Lamport event log + gossip sync
X03 Observability hearthnet/observability/ Tracing, metrics, Doctor health checks, TrackioExporter
X04 Config hearthnet/config.py Typed TOML config + ResearchConfig feature flags
X05 DHT hearthnet/dht/ Kademlia-inspired DHT for cross-LAN peer lookup
X06 WebSocket hearthnet/transport/ WebSocket pubsub (StateBus β†’ live UI push)
X07 Federated Metrics hearthnet/observability/ Opt-in aggregate mesh health metrics
X08 Tensor Transport hearthnet/transport/tensor/ Chunked tensor stream for M26 distributed inference
X09 Conformance Suite hearthnet/conformance/ 21-check black-box conformance runner

Composition Root

HearthNode in hearthnet/node.py is the single composition root.

node = HearthNode(
    node_id="my-node",
    display_name="Alice's Pi",
    community_id="ed25519:abc123",
)
node.install_services(corpus="general")
await node.start()

install_services() registers all services the local hardware supports into the bus. Heavy optional dependencies (torch, chromadb, etc.) are imported lazily and fail gracefully β€” a node with no GPU still works, it just can't answer GPU-only capabilities.


Capability Bus

Caller ──── bus.call(name, version, body) ──────────┐
                                                     β–Ό
                                          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                          β”‚  CapabilityBus   β”‚
                                          β”‚                  β”‚
                                          β”‚  Registry        β”‚
                                          β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
                                          β”‚  β”‚ local route │─┼──► Service.handle()
                                          β”‚  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚
                                          β”‚  β”‚ remote route│─┼──► HTTP POST /bus/v1/call
                                          β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
                                          β”‚  HealthMonitor   β”‚
                                          β”‚  TrustFilter     β”‚
                                          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  • Local route β€” service is installed on this node β†’ direct Python call.
  • Remote route β€” capability is advertised by a peer β†’ HTTP POST to that peer's transport.
  • Version negotiation β€” capabilities are registered with a (major, minor) version; the bus picks the highest compatible version.
  • Health monitoring β€” each service's response times are tracked in a ring buffer; unhealthy services are quarantined for BUS_QUARANTINE_SECONDS.

Data Flow: LLM Chat Request

User types in Gradio UI
       β”‚
       β–Ό
  app.py (Gradio event handler)
       β”‚  bus.call("llm.chat@1.0", body)
       β–Ό
  CapabilityBus.call()
       β”‚
       β”œβ”€ local LlmService found?
       β”‚       β”‚ yes β†’ LlmService.handle() β†’ backend.chat() β†’ yield Token
       β”‚       β”‚
       └─ no local service
               β”‚ peer has llm.chat?
               β”œβ”€ yes β†’ HTTP POST /bus/v1/call β†’ remote node β†’ stream tokens back
               └─ no  β†’ CapabilityError("not_found")

Discovery Flow

Node boots
    β”‚
    β”œβ”€β”€ mDNS: register _hearthnet._tcp.local.  (LAN multicast DNS)
    β”œβ”€β”€ UDP: send announce to 224.0.0.251:7079 every 15s
    β”‚
    β–Ό
PeerRegistry receives announcements from other nodes
    β”‚
    β”œβ”€β”€ new peer β†’ RegistryEvent(kind="added", entry=...)
    β”œβ”€β”€ peer gone (TTL expired) β†’ RegistryEvent(kind="removed", ...)
    └── ManifestPublisher re-publishes every 300s

Emergency Mode

EmergencyDetector (async loop, 30s probe)
    β”‚
    β”œβ”€β”€ probe connectivity endpoints
    β”‚
    β”œβ”€β”€ ONLINE  β†’ EmergencyState.NORMAL
    β”‚                β”‚ UI shows normal theme
    β”‚
    └── OFFLINE β†’ EmergencyState.EMERGENCY
                     β”‚ UI switches to emergency theme (red)
                     β”‚ emergency.llm.chat capability activated
                     β”‚ LoRa beacons sent if hardware available (M29)
                     β”‚ Civil defense alerts published if role cert present (M31)

MoE Expert Routing (M27)

Query arrives at any node
       β”‚
       β–Ό
  MoeRouter.route(query, top_k=3)
       β”‚
       β”œβ”€β”€ score all registered ExpertDescriptors against query
       β”‚   (tag overlap + cosine similarity + recency weighting)
       β”‚
       └── return ranked RouteResult
              β”‚
              β”œβ”€β”€ expert_type="model"   β†’ bus.call(f"llm.chat@1.0", ...) on that node
              β”œβ”€β”€ expert_type="service" β†’ bus.call(expert_capability, ...)
              β”œβ”€β”€ expert_type="human"   β†’ notify via chat + start handoff timer (M27 Β§4)
              └── expert_type="external"β†’ HTTP call to opt-in external API

Enable it: set research.moe_routing = true in ~/.config/hearthnet/config.toml.


Distributed Inference (M26 β€” BitTorrent-style LLM sharing)

Node A: layers 0–15 of Llama-3.2-3B
Node B: layers 16–27 of Llama-3.2-3B
Node C: layers 28–35 (lm_head) of Llama-3.2-3B
                β”‚
                β–Ό
PipelineOrchestrator.plan(model_id="llama3.2:3b")
    β”‚  β†’ discovers shards via experimental.distributed_llm.shard.list
    β”‚  β†’ checks layer coverage: 0..35 βœ“
    β”‚
PipelineOrchestrator.run(pipeline, input_tokens)
    │  → sends activations A→B via X08 TensorTransport (1 MiB chunks)
    │  → B sends activations B→C
    β”‚  β†’ C returns final logits
    β”‚
    └── caller gets streamed tokens like any local model

Model weights are shared chunk-by-chunk using BLAKE3 CID-addressed blob transfer β€” same mechanism as file blobs (M07), but optimised for .gguf / .safetensors files.


File Tree

hearthnet/
β”œβ”€β”€ node.py                    # HearthNode β€” composition root
β”œβ”€β”€ types.py                   # Shared type aliases (NodeID, ShardID, AlertID, …)
β”œβ”€β”€ constants.py               # All numeric defaults and limits
β”œβ”€β”€ config.py                  # HearthnetConfig + ResearchConfig (TOML-backed)
β”œβ”€β”€ cli.py                     # Click CLI entry point
β”œβ”€β”€ facades.py                 # HearthFacade β€” thin high-level API for app.py
β”œβ”€β”€ controller.py              # HearthController β€” legacy thin wrapper
β”‚
β”œβ”€β”€ bus/                       # M03 CapabilityBus
β”‚   β”œβ”€β”€ router.py              # routing logic (local β†’ remote)
β”‚   β”œβ”€β”€ registry.py            # CapabilityEntry, RegistryEvent, Diff
β”‚   β”œβ”€β”€ capability.py          # CapabilityEntry dataclass
β”‚   └── health.py              # ring-buffer health monitor
β”‚
β”œβ”€β”€ identity/                  # M01
β”‚   β”œβ”€β”€ keys.py                # Ed25519 key generation + signing
β”‚   β”œβ”€β”€ manifest.py            # NodeManifest, CommunityManifest, CommunityPolicy, …
β”‚   └── tokens.py              # AuthToken, CapabilityToken
β”‚
β”œβ”€β”€ discovery/                 # M02
β”‚   └── peers.py               # mDNS + UDP multicast PeerRegistry
β”‚
β”œβ”€β”€ transport/                 # X01 / X06 / X08
β”‚   β”œβ”€β”€ client.py              # HTTP + SSE client
β”‚   β”œβ”€β”€ streams.py             # Frame, SseReader
β”‚   β”œβ”€β”€ backpressure.py        # FlowControl, RateCheck, RateLimiter
β”‚   └── tensor/                # X08 tensor chunked transport
β”‚
β”œβ”€β”€ events/                    # X02
β”‚   β”œβ”€β”€ log.py                 # SQLite Lamport event log
β”‚   └── sync.py                # Gossip SyncClient / SyncServer
β”‚
β”œβ”€β”€ observability/             # X03
β”‚   β”œβ”€β”€ tracing.py             # attach/detach trace context
β”‚   β”œβ”€β”€ metrics.py             # MetricsCollector, TrackioExporter
β”‚   └── doctor.py             # DoctorResult, CheckResult, DoctorService
β”‚
β”œβ”€β”€ services/                  # M04 – M21 + M32
β”‚   β”œβ”€β”€ llm/                   # M04 β€” backends: ollama, llama_cpp, lmstudio, hf_api, anthropic
β”‚   β”œβ”€β”€ rag/                   # M05
β”‚   β”œβ”€β”€ marketplace/           # M06
β”‚   β”œβ”€β”€ chat/                  # M10
β”‚   β”œβ”€β”€ embedding/             # M11
β”‚   β”œβ”€β”€ ocr/                   # M17
β”‚   β”œβ”€β”€ translation/           # M18
β”‚   β”œβ”€β”€ stt_tts/               # M19
β”‚   β”œβ”€β”€ vision/                # M20
β”‚   β”œβ”€β”€ tools/                 # M21
β”‚   β”œβ”€β”€ group_chat/            # M25
β”‚   └── protocol/              # M32
β”‚
β”œβ”€β”€ ui/                        # M08
β”‚   β”œβ”€β”€ app.py                 # Gradio 8-tab entry point
β”‚   β”œβ”€β”€ tabs/                  # one file per tab
β”‚   β”œβ”€β”€ theme.py               # hearthnet_theme, emergency_theme
β”‚   β”œβ”€β”€ topology.py            # TopologyComponent (mesh graph)
β”‚   β”œβ”€β”€ onboarding.py          # first-run wizard + invite QR
β”‚   └── mobile/                # M22 PWA manifest + service worker
β”‚
β”œβ”€β”€ emergency/                 # M09
β”‚   β”œβ”€β”€ detector.py            # async probe loop
β”‚   └── state.py               # EmergencyState enum
β”‚
β”œβ”€β”€ crypto/                    # M23
β”‚   └── channel.py             # X25519 + ChaCha20-Poly1305
β”‚
β”œβ”€β”€ blobs/                     # M07
β”‚   └── store.py               # BLAKE3 CID store + chunked reader
β”‚
β”œβ”€β”€ dht/                       # X05
β”œβ”€β”€ federation/                # M14
β”œβ”€β”€ relay/                     # M15
β”‚
β”œβ”€β”€ distributed_inference/     # M26 (experimental)
β”œβ”€β”€ moe/                       # M27 (experimental)
β”œβ”€β”€ fedlearn/                  # M28 (experimental)
β”œβ”€β”€ lora/                      # M29 (experimental)
β”œβ”€β”€ evidence/                  # M30 (experimental)
β”œβ”€β”€ civdef/                    # M31 (experimental)
└── conformance/               # X09

Configuration

~/.config/hearthnet/config.toml (created on first run with defaults):

[node]
node_id      = ""          # auto-generated Ed25519 key ID
display_name = "My Node"
data_dir     = "~/.hearthnet"

[transport]
http_port    = 7080
ui_port      = 7860

[llm]
default_backend = "ollama"   # "ollama" | "llama_cpp" | "lmstudio" | "hf_api" | "smollm"

[rag]
corpus_dir      = "~/.hearthnet/corpus"
embedding_model = "BAAI/bge-small-en-v1.5"

[policy.research]
enable                  = false     # master switch for all experimental modules
moe_routing             = false     # M27
distributed_inference   = false     # M26
fedlearn                = false     # M28
lora_beacons            = false     # M29
evidence                = false     # M30
civil_defense           = false     # M31

Connecting a Local Node to the HF Space

The HF Space at https://huggingface.co/spaces/build-small-hackathon/HearthNet is a single-node anchor you can peer with from any local machine.

# 1. Clone and install
git clone https://huggingface.co/spaces/build-small-hackathon/HearthNet
cd HearthNet
pip install -e .

# 2. Run your local node (pick a free port if 7080 is taken)
python -m hearthnet.cli run --http-port 7080 --ui-port 7860

# 3. Manually add the HF Space anchor as a peer (different network = manual)
python -m hearthnet.cli call discovery.peer.add 1 0 \
  '{"endpoint":"https://build-small-hackathon-hearthnet.hf.space","node_id":"hf-space-anchor"}'

# 4. Verify peering
python -m hearthnet.cli call discovery.peers 1 0 '{}'

Or use the helper script:

python scripts/connect_to_hf.py

Once peered, your local node can:

  • Route LLM queries from the HF Space to your local (better) model
  • Push community posts that appear in the HF Space UI
  • Share blob files across the connection

Note: The HF Space runs on a public server without a static IP for inbound connections. Your local node initiates the connection; the HF Space cannot discover you via mDNS. Use discovery.peer.add or the invite flow to establish the bridge manually.


Security Model

  • Node identity β€” Ed25519 key pair generated locally, never leaves the device.
  • Trust levels β€” unknown β†’ member β†’ trusted β†’ anchor. Capabilities can require a minimum trust level.
  • Capability scoping β€” AuthToken restricts which capabilities a caller may invoke.
  • Channel encryption β€” M23 X25519 ECDH + ChaCha20-Poly1305 for inter-node transport (opt-in, defaults off).
  • Experimental capabilities β€” Phase 3 modules are off by default and require explicit opt-in. The bus refuses to register them unless the feature flag is on.
  • No central authority β€” there is no HearthNet.com, no certificate authority, no registration server. Trust is established peer-to-peer via invite chains.

Testing

# Full suite (133 unit + integration tests):
pytest tests/ -q

# Skip slow E2E browser tests:
pytest tests/ -q -k "not e2e"

# Phase 3 experimental module tests only:
pytest tests/test_phase3_experimental.py -v

# Conformance runner (X09):
python -m hearthnet.conformance.runner --output conformance-report/

This document is generated from the spec set in docs/. For per-module detail see:

  • Phase 1+2: 00-OVERVIEW.md, CAPABILITY_CONTRACT.md, modules/M01-*.md …
  • Phase 3: docs/p2_p3/IMPLEMENTATION_REFERENCE_p3.md, docs/p2_p3/M26-*.md …