Follow-up — post-submission PCAP analysis of the filtered QUIC capture from the 2026-04-21 reproduction (attached to FB22476701, third update). Analysed with tshark 4.6.4 for QUIC long-header parsing; Initial + Handshake packets are decryptable per RFC 9001 §5.2 (keys derived from DCID). 1-RTT STREAM payloads remain opaque — nsurlsessiond does not export SSLKEYLOGFILE.
The 102 s deadlock window is quantifiable on the wire:
Last pre-gap packet: 20:09:51.802 (Client→Server, protected, port 51943)
First post-gap packet: 20:11:34.625 (Client→Server, fresh Initial on new port 50715)
102.823 s with zero packets in either direction — no client PING, no keepalive, no probe, no CONNECTION_CLOSE
The same missing-wakeup pattern was already documented earlier the same evening via sample on cloudd during a reactive-capture repro: the main thread, CFStream.LegacyThread and NSURLConnectionLoader all at 8627/8627 samples blocked in mach_msg2_trap. The user-space thread stacks and the UDP flow tell the same story from two angles — daemon is alive, consuming no CPU, simply not being notified that the QUIC session has died.
Three tshark-level observations worth recording:
All seven TLS handshakes in the capture completed cleanly (7 × ClientHello + ServerHello, ALPN=h3, RTT 7.6–12.2 ms, median ~9 ms). No CONNECTION_CLOSE frames visible in any Initial or Handshake packet (these are decryptable — if present they would show). Session terminations happen in the 1-RTT data phase, not during TLS establishment.
Server is demonstrably healthy: after the 102 s gap, the fresh client Initial is answered by a full server Initial + Handshake in 7.9 ms. The endpoint is neither down nor rate-limited — the silence is purely client-side.
Per-session byte counts confirm the body transfer never progresses meaningfully: largest pre-gap session 85 KB C→S, all three pre-gap sessions combined 110 KB, while cloudd log shows CKAsset body attempts with requestBodyBytes of 2.5–3.3 MB each, all returning err=T, requestDuration=-1.000.
On the client-side gap specifically: RFC 9000 §10.1 permits silent idle close server-side ("the connection is silently closed and its state is discarded"), which a GCS load balancer may legitimately do. But §10.1.1 and §10.1.2 of the same RFC explicitly describe the client sending PING or other ack-eliciting frames for liveness testing, particularly when an endpoint is "expecting response data but does not have or is unable to send application data" (§10.1.2) — precisely cloudd's state during the deadlock. None of that fires in the 102 s window.
The original report's fix recommendations #1 (aggressive stale-session invalidation when the QUIC pool detects an unresponsive session) and #2 (explicit pool-invalidation API surfaced to CloudKit / cloudd) remain precisely targeted. The client-side lack of any idle-phase probing or upper-layer failure propagation is the actionable engineering gap.
All of this is on FB22476701 now (fourth update, 2026-04-21 evening) — this post is a community-side summary for anyone hitting the same pattern.
Topic:
App & System Services
SubTopic:
iCloud & Data
Tags: