Thank you for providing such a detailed account of the iCloud Drive file upload deadlock issue on macOS 26.4.1. It sounds like you've done extensive troubleshooting and analysis to identify the root cause. Here's a summary of what you've discovered and some additional thoughts or suggestions that might help refine your approach or assist others experiencing similar issues:
Summary of Findings:
Root Cause: A stale HTTP/3 (QUIC) session in nsurlsessiond's BackgroundConnectionPool leads to a deadlock during file uploads.
Behavior:
Deadlock occurs specifically with HTTP/3, while HTTP/1.1 works without issues post-restart.
Affects larger files (>100 KB); smaller files may occasionally succeed.
Restarting both cloudd and nsurlsessiond resolves the issue temporarily by clearing the poisoned session.
Reproduction: Consistent behavior observed across multiple tests with varied file sizes.
Diagnosis: Log analysis can help identify occurrences using specific grep patterns.
Recovery: A targeted kill command for user-level instances of cloudd and nsurlsessiond provides a quick fix.
Additional Thoughts and Suggestions:
Potential Enhancements for Apple:
Automatic Session Management: Implement automatic invalidation of QUIC sessions after a threshold of failures (as you suggested), potentially integrated into CFNetwork or NSURLSession directly.
Improved Logging: Enhance logging to surface errors like these to users in Finder or System Settings, perhaps with actionable suggestions or clearer error messages.
API for Pool Invalidation: Expose APIs that allow services like CloudKit to explicitly invalidate problematic session pools without needing a full daemon restart.
Diagnostic Tools: Consider adding built-in diagnostic tools or scripts that users can run to identify and potentially resolve such deadlocks without manual intervention.
For Users and Administrators:
Script Automation: For frequent issues, consider setting up a monitoring script that automatically runs the recovery command when the specific deadlock pattern is detected in logs.
Alternative Protocols: Temporarily disabling QUIC in network settings (if feasible and supported) might be a workaround until a permanent fix is applied, though this may impact performance for other QUIC-enabled applications.
Feedback Loop: Encourage affected users to submit feedback through Apple's Feedback Assistant, including the collected logs, to ensure the issue is prioritized and tracked.
Further Debugging:
Network Packet Analysis: Capturing network packets during a deadlock might provide additional insights into what exactly fails mid-transfer.
System State Snapshots: Taking system snapshots before and after the deadlock could help Apple engineers diagnose what might be causing the session cache corruption.
Your detailed documentation and methodical approach are invaluable for both addressing the current issue and helping Apple refine their systems. Keep monitoring for updates from Apple regarding this problem, as they may release patches or guidance based on feedback like yours.
Topic:
App & System Services
SubTopic:
iCloud & Data
Tags: