# Whalescale: Design Document ## 1. Introduction Whalescale is a peer-to-peer (P2P) VPN architecture designed to provide secure, end-to-end encrypted connectivity without centralized gateway servers, proprietary relay services, or discovery infrastructure. The core mission is to restore peer-to-peer as a first-class networking mode on an internet that has increasingly moved away from it, using best-effort mechanisms that work with the existing CGNAT-ridden world while preferring IPv6 wherever available. Whalescale uses an **Integrated Plane Architecture** — the control and data planes share the same transport, the same encryption session, and the same process. There is no separate WireGuard process to coordinate with, no second NAT traversal problem for a control channel, and no conflict over endpoint management. ``` ┌──────────────────────────────────────────────┐ │ TUN Device (IP packets) │ ├──────────────────────────────────────────────┤ │ Reordering Buffer │ ├──────────────────────────────────────────────┤ │ Path Scheduler │ ├───────────┬───────────┬──────────────────────┤ │ Path 1 │ Path 2 │ Path N │ │ 5G/UDP │ WiFi/UDP │ IPv6/UDP │ ├───────────┴───────────┴──────────────────────┤ │ Noise_IK Session Manager │ ├──────────────────────────────────────────────┤ │ Whalescale Agent (unified process) │ │ Control + Data in one encrypted session │ └──────────────────────────────────────────────┘ ``` ## 2. Core Principles * **Direct Connectivity First:** All data traffic is peer-to-peer whenever possible. No middleman handles user data. * **Encrypted Relay as Fallback:** When direct P2P is physically impossible (symmetric NAT on both sides), traffic is relayed through an anchor node. The relay sees only encrypted ciphertext — it is a packet forwarder, not a MITM. * **Trust-Based Initialization:** Connectivity begins through out-of-band, manual configuration of known, trusted endpoints. * **IPv6 Preferred:** IPv6 with globally routable addresses is the preferred transport. It eliminates NAT entirely for nodes that have it. IPv4 with NAT traversal is the fallback. * **Multipath Native:** A single peer session can utilize multiple network interfaces simultaneously (e.g., 5G + WiFi) for bandwidth aggregation and resilience. * **Anchor-First Topology:** The network assumes at least one anchor node is present. Anchors are the load-bearing walls of the connectivity model. ## 3. Architecture ### 3.1 Identity & Security * **Cryptographic Identity:** Every node is identified by a unique Ed25519 public key. The Whalescale Node ID **is** the public key. There is no separate WireGuard key. * **Noise_IK Handshake:** Whalescale implements the Noise Protocol Framework's Noise_IK pattern natively — the same handshake pattern used by WireGuard — for mutual authentication and end-to-end encryption. This is implemented in userspace using an established Rust Noise library, not via the WireGuard kernel module. * **Zero-Trust Transport:** Even when traffic is relayed through an anchor, the payload remains encrypted with the keys of the two end-nodes. The relay forwards opaque ciphertext. * **Key Rotation:** Symmetric transport keys are rotated on the same schedule as WireGuard (after 2^64 packets or 2 minutes, whichever comes first). ### 3.2 The Data Plane: Custom Transport Whalescale owns its data plane entirely. It does not use the WireGuard kernel module. **Why not WireGuard:** * WireGuard binds one UDP socket per interface and manages peer endpoints autonomously (roaming overrides programmatic `wg set` calls), creating an unresolvable conflict with the control plane. * WireGuard has no multipath concept — one tunnel, one path, per peer. * WireGuard has no extensibility mechanism — control messages cannot be embedded in its sessions. * WireGuard is a kernel module — its state machine cannot be modified or coordinated with. **What Whalescale implements instead:** * **Noise_IK handshake** (same cryptographic security properties as WireGuard) * **Userspace UDP transport** with full control over endpoint management * **Multipath scheduling** — multiple UDP paths per peer session * **Integrated control messages** — control and data share the same encrypted session * **TUN device integration** — reads IP packets from the OS, encrypts and sends; receives, decrypts and writes back **WireGuard compatibility:** The Noise_IK implementation uses WireGuard's exact wire format for handshake and transport messages in single-path mode. This allows a Whalescale node to interoperate with a vanilla WireGuard node for basic connectivity. Extended multipath features require Whalescale on both ends. ### 3.3 Multipath Transport A single peer session can utilize multiple network paths simultaneously for **bandwidth aggregation and resilience**. A "path" is a 4-tuple: `(local_ip:port, remote_ip:port)`. All paths share a single Noise_IK session and a single global sequence space. **Design choice: Packet-level scheduling.** Every outbound IP packet from the TUN device may be sent on any available path. The receiver reassembles packets in global sequence order via a reordering buffer before delivering to its TUN device. This enables bandwidth aggregation for single flows (the primary use case: 5G + WiFi on a mobile device). **Key components:** * **Reordering buffer** — per-session buffer that reassembles packets by global sequence number. Per-gap adaptive timeouts based on measured path latency spread. Late packets (arriving after gap skip) are dropped. Maximum depth of 128 packets. * **Weighted round-robin scheduler** — distributes packets across paths proportional to estimated bandwidth. Credit-based system ensures fair proportional distribution. Sender-side reordering depth constraint prevents fast paths from getting too far ahead. * **Path bandwidth estimation** — rolling 1-second window over ACKed bytes per path. New paths start with minimum weight and ramp up after probing. * **Feedback loop** — periodic ACK messages carry per-path statistics (packets received, estimated latency, loss count) to inform scheduling decisions. ACKs are for scheduling only, not reliability. * **Path health model** — three states (healthy, degraded, failed) with corresponding scheduler actions (full weight, reduced weight, removal). **Inner TCP interaction:** Packet-level scheduling creates reordering that inner TCP may misinterpret as loss. The reordering buffer is the primary mitigation — if it delivers in order, inner TCP is unaware of multipath. With MAX_REORDERING_DEPTH = 128, Linux's auto-tuned `tcp_reordering` (up to 127) generally tolerates the reordering without false fast retransmits. Non-Linux platforms may see occasional false retransmits — this is an accepted tradeoff that test benching will quantify. **Full specification:** See `MULTIPATH.md` for detailed wire format, reordering buffer behavior, scheduler algorithm, bandwidth estimation, feedback loop, path failure detection, path lifecycle, and test bench framework. ### 3.4 The Control Plane: Integrated Messaging Control messages ride inside the same encrypted Noise_IK session as data. There is no separate control channel, no second set of UDP sockets, and no second NAT traversal problem. **Control Message Types:** | Message | Purpose | |---------|---------| | `PATH_ANNOUNCE` | "I have these local interfaces / observed external addresses" | | `PATH_PROBE` | NAT traversal probe + RTT measurement | | `PATH_PROBE_REPLY` | Response with observed external address | | `GOSSIP_UPDATE` | Peer location metadata (see §3.6) | | `HEARTBEAT` | Keep NAT mappings alive; detect path failure | **Bootstrapping a new session:** Before a Noise_IK session is established, control messages use a separate lightweight bootstrap protocol on a well-known UDP port. Once the Noise_IK handshake completes, all subsequent control messages move into the encrypted session. ### 3.5 Connection Lifecycle 1. **Manual Bootstrap:** The user provides the `IP:Port` (or hostname) and `Identity Fingerprint` of an anchor or known peer. The fingerprint is the full Ed25519 public key or a cryptographically secure shorthand (e.g., base32-encoded hash with collision resistance). 2. **NAT Traversal:** * **IPv6:** If both nodes have IPv6 with globally routable addresses, connect directly — no NAT traversal needed. * **IPv4 UDP Hole Punching:** Both nodes simultaneously send UDP packets to each other's last known endpoints to create NAT mappings. * **Signaling via Anchor:** If direct punching fails, the node uses existing active connections (typically to an anchor) to exchange observed connection metadata with the target peer. 3. **Noise_IK Handshake:** Once a path is established, the Noise_IK handshake completes, authenticating both peers and establishing a symmetric encryption session. 4. **Path Expansion:** After the initial path is established, additional paths (other interfaces, IPv6) are discovered and added to the session via `PATH_ANNOUNCE` and `PATH_PROBE` messages. 5. **Established State:** The session is maintained via persistent `HEARTBEAT` messages (which double as NAT keepalives) and continuous path health monitoring. ### 3.6 Peer Discovery & State Management There is no DHT. Discovery relies on **Anchors**, **LKG Caching**, and **Gossip**. #### Anchor Nodes An anchor is a node that: * Is behind a **cone-type NAT** (full cone, restricted cone, or port-restricted cone) or has **no NAT at all** (public IP), such that its external port accepts packets from any source. * Has an **IP address that is stable** on the timescale of days or weeks, not minutes. * Is **reachable at its cached address** by nodes that have previously connected to it. Examples: a cloud VPS, a home desktop on a typical ISP (not CGNAT), a home server behind a router with UPnP-enabled port mapping. **Anchor liveness:** Anchors are a first-class concept. The network tracks anchor availability. If a node detects it is the only remaining anchor, it warns the user. The system recommends at least two anchors on different ISPs to avoid simultaneous IP change events. **Anchor relay:** When two symmetric-NAT nodes need to communicate, an anchor forwards their encrypted packets. This is not a TURN server — it is an existing Whalescale peer performing packet forwarding on already-established tunnels. The anchor sees only Noise_IK ciphertext. #### LKG Cache Every node maintains a persistent local database of every peer it has successfully connected to, including: * Peer ID (Ed25519 public key) * All known endpoints (IP:Port) for each path, with sequence numbers * Last successful connection timestamp * Whether the peer is an anchor #### Gossip Protocol When a node's connection state changes, this information is gossiped to active neighbors. **Gossip Payload:** ``` { PeerID: Ed25519 public key of the peer being described Endpoints: [{ IP, Port, PathType (IPv4/IPv6/LAN) }] SeqNo: Monotonically increasing sequence number (per-peer) SelfAttested: bool — true if this is the peer describing itself Signature: Ed25519 signature over the above fields } ``` **Conflict Resolution:** * **Self-attested endpoints always win.** A peer's own declaration of its address is authoritative over any third-party observation. * **Sequence numbers, not timestamps.** Wall-clock timestamps are unreliable due to clock skew. Monotonically increasing sequence numbers (Lamport-style) provide unambiguous ordering. * **All self-attested updates are signed.** A peer signs its own endpoint declarations with its private key. This prevents malicious nodes from injecting false self-attested addresses for other peers. * **Observed endpoints (third-party) are advisory.** They are used as hints for connection attempts but are never treated as authoritative. **Gossip Mechanics:** * **Bounded fanout:** Each gossip message is sent to a random subset of active neighbors (not all of them), preventing broadcast storms. * **Periodic anti-entropy:** On a slow timer (e.g., every 5 minutes), nodes exchange full state summaries with a random peer to converge on missed updates. This uses a lightweight merkle-tree or hash-diff mechanism. * **Throttled propagation:** Gossip messages are rate-limited per peer to avoid bandwidth waste on mobile connections. ### 3.7 NAT Traversal Strategy **IPv6 (Preferred):** If both nodes have IPv6 with globally routable addresses, connect directly. No NAT, no hole punching. This is the primary path for mobile-to-mobile connectivity and should be attempted first. **IPv4 NAT Traversal:** | NAT Type Pair | Strategy | Outcome | |---------------|----------|---------| | Cone ↔ Any | Cone node's port is reachable; other node initiates | Direct P2P | | Symmetric ↔ Cone | Symmetric node initiates to cone node's known endpoint | Direct P2P | | Symmetric ↔ Symmetric | No direct path possible | Anchor relay | | Any ↔ Public | Direct connection | Direct P2P | **Proactive Port Mapping:** Implementation of **UPnP**, **NAT-PMP**, and **PCP** to request persistent port forwarding on supported routers. This converts a cone NAT into an effectively public endpoint and is the highest-value NAT traversal mechanism. **NAT Type Detection:** Nodes detect their own NAT type by comparing their local endpoint against the externally observed endpoint reported by peers via `PATH_PROBE_REPLY`. If the external port changes depending on the destination, the NAT is symmetric. **LAN-Local Discovery:** When two Whalescale nodes are on the same LAN, they should discover each other directly via mDNS/broadcast or by observing matching public IP addresses, and communicate on the LAN without traversing NAT at all. This avoids hairpin NAT (which many routers implement incorrectly or not at all). **What is NOT implemented:** * **Port prediction / port sweeping:** Architecturally defeated by CGNAT. The external port assigned by a CGNAT depends on every other subscriber's concurrent activity, making prediction infeasible. Port sweeping wastes battery and bandwidth with near-zero success probability. Removed entirely. ### 3.8 Connection Recovery To handle mobile IP changes and NAT timeouts: **Recovery Mode (Disconnected Peer):** 1. Upon connection loss, the node enters Recovery Mode. 2. It attempts to re-establish contact with each cached endpoint in its LKG Cache, starting with anchors. 3. For each anchor, the node sends a UDP packet to the anchor's last known endpoint. Since the mobile node is initiating outward, this traverses any NAT type. 4. Once an anchor is reached, the anchor can signal the mobile node's new address to other peers. 5. Intelligent backoff: probe intervals increase exponentially (1s → 2s → 4s → ... → 60s cap) to conserve battery on mobile devices. **Passive Acceptance (Connected Peer):** 1. Anchors and stable nodes continuously listen on their mapped ports. 2. When a valid, authenticated packet arrives from a known PeerID on a new endpoint, the node immediately accepts the new path and updates its LKG cache. 3. The new endpoint is gossiped to other neighbors. **Dual-Anchor Mutual Keepalive:** Two anchor nodes on different ISPs maintain each other's current addresses via periodic probes. If one anchor's IP changes, the other detects the change on the next successful probe and gossips the new address. The network re-converges as long as at least one anchor remains reachable. ### 3.9 Failure Modes & Honest Limitations **Permanent Partition:** If all cached endpoints for a peer become stale and no anchor can reach that peer, the connection is lost until out-of-band resynchronization occurs. Mobile devices are most susceptible to this. **Symmetric ↔ Symmetric:** Two nodes both behind symmetric CGNAT cannot establish a direct P2P connection. They must communicate via anchor relay. This is not a limitation of the design — it is a fundamental property of symmetric NAT that no amount of coordination can overcome. **Single Anchor Failure:** If the network's only anchor goes offline, mobile nodes lose their reconnection mechanism. The system must warn when only one anchor remains. **Userspace Performance:** The custom data plane runs in userspace, topping out around 1–2 Gbps on modern hardware (vs. WireGuard kernel's ~4 Gbps). This is acceptable for the intended use case (mobile, home, small office networks). ## 4. Network Stack ### 4.1 Whalescale Agent (Unified Process) * **Transport Layer:** UDP (one socket per local network interface). * **Encryption:** Noise_IK (userspace, via established Rust Noise library). * **Multipath:** Multiple UDP paths per peer session, with reordering buffer. * **Responsibilities:** Peer discovery, NAT traversal, signaling, encrypted transport, path scheduling, IP packet encapsulation, TUN device management. ### 4.2 Bootstrap Protocol (Pre-Session) * **Transport Layer:** UDP on a well-known port. * **Purpose:** Exchange initial endpoint metadata before Noise_IK handshake is established. * **Messages:** `PATH_PROBE`, `PATH_PROBE_REPLY`, and session initiation. * **Security:** Bootstrap messages are unencrypted but carry no sensitive data (only IP:port observations). Full authentication occurs during the Noise_IK handshake. ## 5. Implementation Phases | Phase | Scope | Goal | |-------|-------|------| | 1 | Noise_IK session, single path, single peer, TUN integration | Basic VPN tunnel between two nodes | | 2 | Multipath transport — weighted round-robin scheduler, reordering buffer, path management, feedback loop, bandwidth estimation | Bandwidth aggregation across multiple interfaces | | 3 | Multi-peer session management, LKG cache, gossip | Small mesh network | | 4 | NAT traversal (hole punching, UPnP/PCP, anchor signaling) | Cross-NAT connectivity | | 5 | LAN discovery, IPv6 preference, anchor relay | Production robustness | | 6 | Adaptive path scheduling, test bench framework, scheduler comparison | Optimize multipath performance | **Phase 2 is where the novel work lives.** The multipath transport has significant open questions (inner TCP interaction, reordering depth tuning, scheduler optimality) that require empirical validation. The test bench framework in Phase 6 will compare scheduler variants and parameter settings against real workloads. See `MULTIPATH.md` §11 for the full test bench specification.