The supercomputer you already own
cadencIA shards giant AI models across a P2P hive of phones, PCs and home NPUs. No data centers. No billion-dollar power bills. Just the heat your hardware already wastes.
✓ Sharding · 12 layers → 4 alpha · 8 beta nodes
✓ Routing · WebRTC mesh · 5ms local cluster
✓ KV cache · INT4 quantized · −80% bandwidth
✓ Consensus · Merkle hash OK · 0 byzantine
This is how the hive breathes
Each dot is a real device (phone, PC, NPU). The blue pulses are tensors traveling between shards in real time.
Four pillars holding the hive together
Not an academic paper: the physiology of a distributed nervous system.
Dynamic P2P mesh
CoreElastic WebRTC network with a tweaked DHT. Distance is logical, not geographic: with few nodes, Spain and New Zealand are neighbors.
Hybrid inference sharding
Pipeline parallelism across layers + cross-node attention. MLP to beta (mobile) nodes, attention to alpha nodes with RAM and AC power. KV cache quantized to INT4.
BFT + zero-trust
Dynamic Merkle trees sign every tensor. Cryptographic slashing for toxic nodes. Sharding = anonymization: no node ever sees the full prompt.
Universal runtime
Same Wasm kernel on iOS, Android, Windows and Linux. WebGPU talks directly to mobile NPUs or NVIDIA Tensor Cores — no rewrites.
The lifecycle of a prompt
From your thumb to 12 devices spread across the planet and back — without ever touching AWS.
Local tokenization
Your device turns the prompt into tokens and end-to-end encrypts it.
Segmentation
The local orchestrator splits the model into 12 shards using the nearby-nodes table.
P2P injection
Shards travel through WebRTC tunnels. Broadcast to 3 nodes: fastest wins.
Compute cascade
Node A layers 1-5 → Node B layers 6-10 → Node C final block. KV cache shared in INT4.
Merkle consensus
The assembler validates partial hashes. If a node lied, instant slashing.
De-tokenization
The answer flows back to the user. Target latency: < 1.5s in a local cluster.
Every node emits a Health Vector
The hive does not assign tasks at random. A fitness function combines TFLOPS, battery, latency and historical stability to decide who processes what — and the weights shift per task.
+ w₂ · Battery%
+ w₃ · 1 / Latency
+ w₄ · Stability
The metaverse is the interface, not the product
Every tensor op projects as a game mechanic. Latency is visible. Consensus is played. Compute is built.
MDS projection
Inter-node latencies are not Euclidean. A force algorithm collapses the distance matrix into a 2D/3D map where "close" means "fast".
CRDT synchronization
Without a central server, the same tech behind Figma guarantees your city and your neighbor's converge regardless of packet order.
Watt economy
Reward = ∫ (Task_Complexity · Uptime) · Efficiency_Factor. If your phone overheats, your factor drops. Incentivizes healthy hardware, not blind mining.
While OpenAI pays for cooling,
cadencIA uses the heat you already waste.
The energy is paid. The NPUs are idle. Home networks are highways with potholes — we do not treat them as perfect pipes, we design for constant packet loss. That is the most sustainable and democratic processing model that exists.
From cloud seed to a 100% edge hive
Controlled bootstrap: our sentinel nodes spin the network up, the community inherits it.
Phase 0 · Sentinels
AWS/Azure seed nodes acting as "ghost players". They carry 90% of initial compute.
- Local orchestrator v0.1
- Wasm runtime POC
- 3-region mesh
Phase 1 · Public alpha
Open to 500 real devices. Pipeline parallelism with Llama-3 8B sharded across 4 nodes.
- iOS / Android client
- INT4 KV cache
- Trust score v1
Phase 2 · Accordion World
Gamified visual layer. CRDTs syncing state. MDS projection of latencies.
- City editor
- Verifiable credits
- Speculative decoding
Phase 3 · Sentinels off
Cloud nodes shut down. 100% community infrastructure. 70B models running on the hive.
- Cross-node attention
- Full BFT
- Hive federation