# Whalescale: Multipath Transport Specification ## 1. Overview Whalescale multipath allows a single peer session to utilize multiple network paths simultaneously for bandwidth aggregation and resilience. Each path is a 4-tuple `(local_ip:port, remote_ip:port)` carried over UDP. All paths share a single Noise_IK encryption session and a single global sequence space. **Design choice:** Packet-level scheduling with reordering. Every packet from the TUN device is assigned a global sequence number and may be sent on any available path. The receiver reassembles packets in sequence order before delivering to its TUN device. **Alternative: Flow-level scheduling.** All packets from the same 5-tuple go to the same path. No reordering buffer needed. Inner TCP behaves normally. But a single flow can never exceed one path's capacity — no bandwidth aggregation for the common case of a mobile device streaming video over 5G + WiFi. **Alternative: Hybrid (flow-level default, packet-level for large flows).** Requires flow classification, state tracking, and both scheduling paths. Adds complexity without fully avoiding reordering problems. Deferring the hard problem doesn't eliminate it. **Rationale for packet-level:** The primary use case — bandwidth aggregation for single flows on multi-interface mobile devices — requires packet-level scheduling. The reordering complexity is the price of this capability. Test benching will validate whether the inner TCP interaction cost is acceptable in practice. ## 2. Packet Format ### 2.1 Whalescale Transport Header (Inside Noise_IK Encryption) ``` Offset Size Field 0 4 Session ID 4 8 Global Sequence Number 12 1 Path ID 13 1 Type 14 2 Payload Length 16 var Payload ``` | Field | Size | Description | |-------|------|-------------| | Session ID | 32 bits | Identifies the peer session. Allows demultiplexing when multiple sessions share a socket. | | Global Sequence | 64 bits | Monotonically increasing per-session. Used for reordering. 64-bit avoids wraparound (2^64 packets at any realistic rate will never wrap). | | Path ID | 8 bits | Identifies which path carried this packet. Receiver uses this for per-path statistics. Up to 256 paths per session (far more than needed). | | Type | 8 bits | Packet type: `DATA (0x01)`, `CONTROL (0x02)`, `ACK (0x03)`, `PROBE (0x04)` | | Payload Length | 16 bits | Length of payload in bytes. Max 65535 bytes. | | Payload | variable | For DATA: an IP packet read from TUN. For CONTROL/ACK/PROBE: structured control message. | **Design choice:** 64-bit global sequence number. **Alternative: 32-bit sequence number.** Wraps at ~4 billion packets. At 100,000 packets/sec (reasonable for a 1 Gbps VPN tunnel with small packets), wraparound occurs in ~11 hours. Handling wraparound correctly in the reordering buffer is subtle and error-prone. **Rationale:** 64-bit eliminates the wraparound problem entirely. The 4-byte overhead per packet is negligible. ### 2.2 ACK Packet Payload ``` Offset Size Field 0 8 Highest Contiguous Sequence (HCS) 8 2 ACK Path Count 10 var Per-path ACK entries (see below) Per-path entry: Offset Size Field 0 1 Path ID 1 8 Packets received on this path (since last ACK) 9 4 Estimated one-way latency (microseconds) 13 2 Packets lost on this path (since last ACK) 15 1 Flags (0x01 = path degraded, 0x02 = path failed) ``` The receiver sends ACKs to the sender for scheduling feedback. These are NOT reliability ACKs — there is no retransmission. They inform the sender's scheduling decisions. **Design choice:** Per-path statistics in ACKs. **Alternative: Sequence-level ACKs (like TCP SACK).** Would enable retransmission-style reliability. Rejected because Whalescale is an unreliable transport — inner TCP handles its own retransmission. **Alternative: Aggregate ACKs only (no per-path info).** Simpler but deprives the sender of the information needed to adjust per-path weights intelligently. **Rationale:** Per-path statistics allow the sender to detect asymmetric performance and adjust scheduling weights. The overhead is small (15 bytes per path per ACK). ## 3. Reordering Buffer ### 3.1 Overview The receiver maintains one reordering buffer per peer session. Packets are inserted at their global sequence position. Contiguous sequences starting from the next expected delivery position are released to the TUN device. ### 3.2 Timeout Mechanism Each missing sequence number gets a deadline: ``` deadline[seq] = time_of_previous_delivery + REORDERING_TIMEOUT ``` Where `time_of_previous_delivery` is the timestamp when the packet preceding the gap was delivered to TUN, and `REORDERING_TIMEOUT` is calculated as: ``` REORDERING_TIMEOUT = max( slowest_path_one_way_latency - fastest_path_one_way_latency, MIN_REORDERING_TIMEOUT ) MIN_REORDERING_TIMEOUT = 5ms ``` One-way latency is estimated as `path_RTT / 2` for each path. **Design choice:** Per-gap deadlines with adaptive timeout. **Alternative: Fixed global timeout.** One timeout value for the entire buffer (e.g., 50ms). Simple but wasteful — fast gaps wait too long, slow gaps don't wait long enough. **Alternative: No timeout, wait indefinitely.** A missing packet blocks all subsequent delivery forever. Only acceptable with reliability (retransmission), which Whalescale doesn't provide. **Rationale:** Per-gap deadlines are the most correct approach — each gap waits only as long as the slowest path could reasonably deliver the missing packet. The adaptive timeout based on measured path latency spread ensures the buffer doesn't wait longer than necessary. ### 3.3 Gap Skip Behavior When a gap's deadline expires: 1. Mark the missing sequence number as skipped. 2. Deliver all contiguous packets after the gap to the TUN device. 3. If the missing packet arrives later, **drop it.** The inner protocol (TCP) will retransmit if the data was needed. Inner UDP never expected reliability. **Design choice:** Drop late packets after gap skip. **Alternative: Deliver late packets out-of-order.** Would create a second reordering event for the inner TCP. Worse than dropping — inner TCP handles loss (via retransmission) better than reordering (via false fast retransmits). **Alternative: Never skip gaps, buffer grows unbounded.** A genuinely lost packet blocks all future delivery. Unacceptable. ### 3.4 Buffer Depth Limit The reordering buffer has a maximum depth of `MAX_REORDERING_DEPTH = 128` packets. This limits memory usage and bounds the maximum reordering that the inner traffic can observe. When the buffer is full (128 packets waiting for a gap): 1. The oldest gap is force-skipped. 2. All contiguous packets after it are delivered. 3. This may cause the inner TCP to see a loss event. **Design choice:** Aggressive depth of 128 packets. **Alternative: Conservative depth (8-16 packets).** Minimizes inner TCP disruption but severely limits bandwidth aggregation. A 16-packet buffer at 1500 bytes/packet is only 24KB — this fills in under 1ms at even moderate data rates, effectively preventing the scheduler from using slower paths. **Alternative: Moderate depth (32-64 packets).** Better bandwidth utilization with moderate reordering. Likely sufficient for paths with small RTT differences (e.g., 5G and WiFi both under 50ms). **Alternative: Unlimited depth.** No forced skips. Only gap deadlines cause skips. But a sustained high-rate flow with one dead path would cause the buffer to grow without bound. **Rationale for 128:** Linux auto-tunes `tcp_reordering` up to 127 based on observed reordering. With MAX_REORDERING_DEPTH = 128, Linux inner TCP will generally tolerate the reordering without false fast retransmits. Non-Linux stacks (Windows, macOS) may see some false retransmits — this is a known tradeoff that test benching will quantify. ### 3.5 Buffer Memory At 1500 bytes per packet and depth 128: ~192KB per peer session. For a network of 50 peers: ~9.6MB. Acceptable. ## 4. Scheduler ### 4.1 Weighted Round-Robin The scheduler assigns a weight to each path proportional to its estimated bandwidth. It uses a credit system to distribute packets across paths according to these weights. **Algorithm:** Each path maintains: - `weight`: integer, proportional to estimated bandwidth - `credits`: integer, number of packets this path is owed **On each scheduling decision (send one packet):** ``` 1. Select the path with the highest credits among paths with credits > 0. 2. Tie-break: lowest estimated one-way latency. 3. Decrement selected path's credits by 1. 4. If no path has credits > 0, replenish all paths and retry. ``` **Credit replenishment (runs when all credits are exhausted, or periodically):** ``` For each path: credits += weight ``` **Weight calculation:** ``` For each path: estimated_bw = measured bytes/sec over rolling window (1 second) weight = max(1, round(estimated_bw / BASE_RATE)) BASE_RATE = estimated bandwidth of the slowest active path ``` This ensures the slowest path always has weight ≥ 1, and faster paths are weighted proportionally. **Design choice:** Weighted round-robin with credits. **Alternative: Strict round-robin.** Every path gets one packet in turn. Simple but ignores bandwidth differences — a 100 Mbps path gets the same packet rate as a 10 Mbps path, causing the fast path to be underutilized and the slow path to queue. **Alternative: minRTT (always send on fastest path).** Minimal reordering, no buffer complexity. But a single flow can't exceed one path's capacity. Defeats the purpose of multipath. **Alternative: Deadline-aware scheduling.** Assign each packet a "latest acceptable delivery time" based on reordering constraints, then pick the path that delivers soonest within the deadline. More optimal but requires latency prediction per path and is harder to reason about and debug. **Alternative: BLEST (Blocked Estimation Scheduling from MPTCP).** Estimates how many packets can be sent on a slow path before they'd block the reordering buffer. Designed for reliable transport with retransmission — its estimates assume packets will eventually be delivered, which is true in MPTCP but not in Whalescale's unreliable transport. **Rationale:** Weighted round-robin is simple, predictable, and adjustable. It's the right starting point for test benching — easy to implement, easy to reason about, and easy to replace with a more sophisticated scheduler later. The credit system ensures proportional bandwidth utilization without requiring per-packet latency prediction. ### 4.2 Reordering Depth Constraint Before sending a packet on the selected path, the scheduler checks: ``` reordering_depth = number of packets in-flight on slower paths that haven't been ACKed yet if reordering_depth > MAX_REORDERING_DEPTH: // Don't send on this path — it would advance too far ahead // Try the next-best path, or wait ``` This prevents the fast path from getting so far ahead that the reordering buffer would exceed its depth limit. **Design choice:** Global reordering depth limit applied at the sender. **Alternative: No sender-side constraint, rely entirely on receiver buffer depth limit.** Simpler sender, but receiver must force-skip gaps more often, causing more inner TCP disruption. **Alternative: Per-flow reordering depth limits.** Different limits for different inner flows based on their observed tolerance. Requires flow classification and per-flow state — complex. **Rationale:** The sender-side constraint is a backpressure mechanism — it prevents the problem rather than reacting to it. Complementary to the receiver's depth limit. ### 4.3 Path Priority When credit counts are equal (or on tie-break), the scheduler prefers: 1. IPv6 paths over IPv4 paths (no NAT, lower latency variance) 2. Lower one-way latency 3. Higher estimated bandwidth This ensures IPv6 is preferred when available and performant. ## 5. Path Bandwidth Estimation ### 5.1 Rolling Window Measurement Each path tracks bytes sent and acknowledged over a rolling 1-second window. Estimated bandwidth is: ``` estimated_bw = bytes_acked_in_last_second / 1.0 ``` **Design choice:** Simple rolling window over ACKed bytes. **Alternative: Packet-pair estimation.** Send two probe packets back-to-back, measure inter-arrival time at receiver. More responsive to sudden bandwidth changes but noisy and requires probe traffic. **Alternative: BBR-style bandwidth estimation.** Model bandwidth and RTT, adapt sending rate. Designed for congestion control, not for scheduling weights. Overkill for this purpose. **Alternative: Sender-side bytes-sent only.** No ACK feedback needed. But doesn't account for loss — a path may have packets sent but dropping, inflating the estimate. **Rationale:** Rolling window over ACKed bytes accounts for loss (lost packets aren't ACKed), is simple to implement, and provides stable estimates for scheduling weights. The 1-second window smooths transient fluctuations. Test benching should compare responsiveness of different window sizes (250ms, 500ms, 1s, 2s). ### 5.2 Initial Bandwidth Estimate When a path is first added, its bandwidth is unknown. Initial behavior: 1. Start with `weight = 1` (minimum, same as the slowest path). 2. Send probe traffic (`PROBE` packets) to measure initial RTT. 3. After the first ACK round-trip, set `weight` based on the initial throughput measurement. 4. Allow full scheduling after 2-3 ACK rounds (2-3 seconds) when estimates stabilize. This prevents a new path from being overloaded before its capacity is known. ## 6. Feedback Loop ### 6.1 ACK Frequency ACKs are sent: - Every 100ms (periodic), OR - Every 200 DATA packets received (volume-based), whichever comes first. **Design choice:** Dual trigger (time and volume). **Alternative: Pure periodic (fixed interval).** At high packet rates, 100ms between ACKs means the sender's view of the receiver's state is 100ms stale. At low packet rates, periodic ACKs waste bandwidth. **Alternative: Pure volume-based (every N packets).** At low packet rates, ACKs may not be sent for seconds. At high rates, ACKs may be sent too frequently, consuming bandwidth. **Rationale:** The dual trigger ensures ACKs are sent frequently enough for good scheduling decisions (time-based floor) and not too frequently at high rates (volume-based ceiling). ### 6.2 What the Sender Does with ACKs On receiving an ACK: 1. **Update per-path stats:** packets received, estimated latency, loss count, flags. 2. **Recalculate path weights** based on updated bandwidth estimates. 3. **Check reordering depth:** If `Highest Contiguous Sequence` isn't advancing, the reordering buffer is backing up. Reduce credits on fast paths to allow slow paths to catch up. 4. **Detect path degradation:** If a path is flagged as degraded, reduce its weight. If flagged as failed, remove it from scheduling. ### 6.3 What the Sender Does WITHOUT ACKs If no ACK is received from a peer for `3 × ACK_INTERVAL` (300ms), the sender: 1. Probes all paths with `PROBE` packets. 2. If some paths respond and others don't, degrade the non-responsive paths. 3. If no paths respond, the entire session may be down — trigger connection recovery. ## 7. Inner TCP Interaction ### 7.1 The Problem When Whalescale reorders packets across paths, the inner TCP (running inside the VPN tunnel) may interpret out-of-order delivery as loss: - **3 duplicate ACKs → fast retransmit:** Wastes bandwidth on redundant retransmission. - **Retransmission timeout → congestion window collapse:** Severe throughput degradation lasting seconds. ### 7.2 Mitigation: Reordering Buffer The primary mitigation. If the buffer delivers packets in order to the TUN device, the inner TCP never knows multipath is happening. The buffer must release the missing packet before the inner TCP's loss detection triggers. ### 7.3 Known Tradeoff: Aggressive Reordering Depth With `MAX_REORDERING_DEPTH = 128`: - **Linux inner TCP:** Auto-tunes `tcp_reordering` up to 127. Will generally tolerate 128 out-of-order packets without false fast retransmits. Well-behaved. - **Windows inner TCP:** `TcpReordering` is not well-documented but generally tolerates moderate reordering. May see some false retransmits with 128-packet reordering. - **macOS inner TCP:** Similar to Windows — moderate tolerance, some false retransmits likely. - **Inner UDP:** Not affected — UDP has no reordering detection. Application-level jitter buffers handle it. **Design choice:** Accept some false retransmits on non-Linux platforms as a performance tradeoff. **Alternative: Conservative depth (8-16).** Eliminates false retransmits on all platforms but severely limits bandwidth aggregation. Not worth the cost for the primary use case. **Rationale:** Linux is the primary deployment target (servers, embedded, Android). Windows/macOS are secondary. The performance gain from aggressive multipath outweighs occasional false retransmits. ### 7.4 Monitoring The reordering buffer tracks: - **Gap skip rate:** How often gaps are force-skipped (indicates reordering exceeding the timeout) - **Late packet drop rate:** How often late packets arrive after their gap was skipped (indicates timeout was too short) - **Average buffer depth:** How many packets are typically in the buffer (indicates path latency spread) These metrics are exposed via the agent's status interface and should be used during test benching to evaluate scheduler performance and tune parameters. ### 7.5 Open Research Question Can Whalescale infer the inner TCP's RTT estimate from packet timing patterns (e.g., observing burst patterns that indicate TCP congestion window growth)? If so, the reordering timeout could be set relative to the inner TCP's RTT, providing a tighter bound. This is uncharted territory — test benching with packet captures will determine if the signal is extractable. ## 8. MTU Handling ### 8.1 Strategy: Minimum Path MTU The VPN interface MTU is set to the minimum MTU across all active paths for a given session: ``` vpn_mtu = min(path_mtu for all active paths in session) vpn_mtu -= TRANSPORT_OVERHEAD // Whalescale header + Noise_IK overhead ``` When paths are added or removed, the VPN MTU is recalculated. If a path with a smaller MTU is added, the VPN MTU decreases. If the smallest-MTU path is removed, the VPN MTU increases. **Design choice:** Minimum path MTU across all paths. **Alternative: Per-path MTU with fragmentation.** Send full-size packets on high-MTU paths, fragment on low-MTU paths. Maximizes throughput on high-MTU paths but fragmentation is expensive, increases loss sensitivity (any fragment loss loses the whole packet), and interacts badly with inner TCP MSS negotiation. **Alternative: Per-path MTU with inner MSS clamping.** Advertise different MSS to different inner TCP connections based on which path they'd use. Requires flow-level scheduling (contradicts packet-level design) and deep packet inspection of inner TCP SYN packets. **Rationale:** Minimum path MTU is simple, correct, and the performance cost is small (a few percent of capacity). The added complexity of per-path MTU is not justified for the target use case. ### 8.2 Path MTU Discovery On path establishment: 1. Start with the local network interface MTU (e.g., 1500 for Ethernet, 1492 for PPPoE). 2. Subtract UDP/IP overhead (28 bytes for IPv4, 48 bytes for IPv6). 3. Subtract Whalescale transport header (16 bytes) and Noise_IK overhead. 4. Optionally send PMTU probes (increasingly large PROBE packets) to confirm the path can carry the estimated MTU. If a probe is not acknowledged, reduce the path MTU. ### 8.3 VPN MTU Changes When the VPN MTU changes: - Notify the OS via the TUN device's MTU setting. - Inner TCP connections will adapt to the new MSS on new connections. Existing connections continue with their negotiated MSS — packets may need fragmentation at the Whalescale layer if they exceed the new path MTU. This is a rare event (only when a new, smaller-MTU path is added mid-session). ## 9. Path Failure Detection ### 9.1 Heartbeat-Based Detection Each path sends a `HEARTBEAT` control message every 1 second. If 3 consecutive heartbeats on a path are not acknowledged (3 seconds without any traffic on that path), the path is marked as **failed**. ### 9.2 Data-Driven Detection If no packets (DATA or CONTROL) have been received on a path for `2 × path_estimated_RTT`, and the heartbeat timer hasn't triggered yet, send an immediate `PROBE`. If no response within another `path_estimated_RTT`, mark the path as **degraded**. ### 9.3 Degradation vs. Failure | State | Definition | Scheduler Action | |-------|-----------|-----------------| | Healthy | Normal RTT and loss | Full weight | | Degraded | High RTT or elevated loss | Reduced weight (50% of estimated bandwidth) | | Failed | No traffic for 3 heartbeats | Removed from scheduling entirely | **Design choice:** Three-state path health model. **Alternative: Binary (healthy/failed).** Simpler but doesn't handle the common case of a path that's slow (e.g., cellular handover) but still working. Prematurely removing a path wastes its remaining bandwidth and forces all traffic onto other paths. **Alternative: Continuous weight based on loss rate.** No discrete states — weight is a continuous function of measured loss. More granular but harder to reason about and debug. **Rationale:** The three-state model captures the important distinction between "slow but working" and "actually dead." Degraded paths still contribute bandwidth at reduced scheduling intensity. ### 9.4 Buffer Flush on Path Failure When a path is marked as failed: 1. The receiver force-skips all gaps attributed to the failed path. 2. The sender removes the path from scheduling and redistributes weight to remaining paths. 3. Any packets in flight on the failed path at the time of failure detection are assumed lost. This is critical — without flushing, the reordering buffer blocks indefinitely waiting for packets from a dead path. ### 9.5 Path Recovery A failed path is not permanently removed. It is moved to a recovery queue: 1. The sender periodically (every 30 seconds) sends a `PROBE` to the failed path's last known endpoint. 2. If the probe is answered, the path is re-added as degraded, with weight = 1. 3. Normal bandwidth estimation resumes. If the path performs well, it transitions to healthy. This handles the case where a path fails due to temporary network issues (e.g., WiFi reconnect) and recovers later. ## 10. Path Lifecycle ### 10.1 Discovery Candidate paths are discovered through: 1. **Local interface enumeration:** The agent lists all local network interfaces and their addresses (IPv4 and IPv6). 2. **Anchor-observed addresses:** Anchor nodes report the external IP:port they observe for a peer. 3. **PATH_ANNOUNCE messages:** Connected peers advertise their available interfaces and observed external addresses to each other. 4. **LAN discovery:** mDNS or broadcast for same-network peers. ### 10.2 NAT Traversal For each candidate path: 1. If the remote endpoint is known (from PATH_ANNOUNCE or LKG cache), send a `PATH_PROBE` to that endpoint. 2. If behind NAT, perform UDP hole punching as described in DESIGN.md §3.7. 3. If UPnP/NAT-PMP/PCP is available, request a port mapping proactively. ### 10.3 Probing Before a path is used for data, it must be probed: 1. Send `PATH_PROBE` packets, measure RTT from the response. 2. Send a small burst of DATA packets at a conservative rate to estimate initial bandwidth. 3. After 2-3 probe rounds (2-3 seconds), the path transitions to active. ### 10.4 Monitoring Active paths are continuously monitored: - **RTT:** Measured from PROBE round-trips and from data packet timing. - **Loss rate:** Tracked from ACK feedback (packets sent vs. packets acknowledged per path). - **Throughput:** Rolling 1-second window of bytes ACKed. ### 10.5 Addition When a new path is added to a session: 1. Assign it a Path ID. 2. Start with weight = 1, credits = 0. 3. Begin probing (RTT measurement, initial bandwidth estimate). 4. After probing completes, set weight based on estimated bandwidth and begin scheduling. 5. Recalculate VPN MTU if the new path has a smaller MTU than the current minimum. ### 10.6 Removal When a path is removed from a session: 1. Stop scheduling new packets on the path. 2. Wait for in-flight packets to be acknowledged (up to `path_RTT` timeout). 3. Force-skip any gaps from unacknowledged packets. 4. Remove the path from the session. 5. Recalculate VPN MTU. 6. Redistribute weight to remaining paths. ### 10.7 Interface Events The agent monitors OS-level network interface events: - **Interface up:** Begin discovery and NAT traversal for new paths on this interface. - **Interface down:** Remove all paths associated with this interface, flush their gaps. - **Address change:** Treat as a new path candidate (new address) + removal of old path (old address). - **IPv6 address added:** Immediately attempt IPv6 path — this is high-priority due to NAT elimination. ## 11. Test Bench Framework The multipath subsystem has significant open questions that require empirical validation. The following scenarios should be benchmarked: ### 11.1 Scheduler Comparison Implement each scheduler as a swappable component and measure: | Scheduler | Metrics | |-----------|---------| | Weighted round-robin (chosen) | Throughput, reordering depth, inner TCP goodput | | Strict round-robin | Same metrics (expected: lower throughput on asymmetric paths) | | minRTT | Same metrics (expected: no aggregation, minimal reordering) | | BLEST-adapted | Same metrics (expected: better on asymmetric paths, but designed for reliable transport) | | Random (baseline) | Same metrics (lower bound on performance) | ### 11.2 Reordering Depth Tuning Vary `MAX_REORDERING_DEPTH` across {8, 16, 32, 64, 128, 256} and measure: - Inner TCP goodput (iperf3 inside the tunnel) - Inner TCP retransmit rate (captured from inner TCP stats) - Gap skip rate (from Whalescale reordering buffer metrics) - End-to-end latency for inner UDP traffic ### 11.3 Path Latency Spread Test with paths of different RTT spreads: - Same-RTT paths (e.g., two WiFi connections): minimal reordering expected - Moderate spread (e.g., 20ms WiFi + 50ms 5G): common real-world case - Large spread (e.g., 20ms WiFi + 200ms satellite): extreme case, high reordering - One path with high jitter: tests timeout calculation robustness ### 11.4 Path Failure Scenarios - Sudden path failure (disconnect WiFi during transfer) - Gradual degradation (increasing loss rate on one path) - Path recovery (WiFi reconnect after failure) - All-but-one path failure (stress test for failover) ### 11.5 Inner Traffic Types - Single long-lived TCP flow (iperf3): tests reordering impact on TCP congestion control - Multiple concurrent TCP flows: tests scheduler fairness - UDP traffic (VoIP, gaming): tests latency and jitter impact - Mixed TCP + UDP: tests scheduler prioritization ### 11.6 Real-World Mobile Scenarios - 5G + WiFi on a mobile device (the primary use case) - WiFi + Ethernet on a laptop - IPv4 + IPv6 dual-stack (IPv6 preferred, IPv4 fallback) - Cellular handover between towers (path RTT changes mid-flow)