Reproducible EXC_BAD_ACCESS in NEDNSProxyProvider when using async/await variants of NEAppProxyUDPFlow

Description

I am seeing a consistent crash in a NEDNSProxyProvider on iOS when migrating from completion handlers to the new Swift Concurrency async/await variants of readDatagrams() and writeDatagrams() on NEAppProxyUDPFlow.

The crash occurs inside the Swift Concurrency runtime during task resumption. Specifically, it seems the Task attempts to return to the flow’s internal serial executor (NEFlow queue) after a suspension point, but fails if the flow was invalidated or deallocated by the kernel while the task was suspended.

Error Signature

Thread 4: EXC_BAD_ACCESS (code=1, address=0x28) 
Thread 4 Queue : NEFlow queue (serial)
#0	0x000000018fe919cc in swift::AsyncTask::flagAsAndEnqueueOnExecutor ()
#9	0x00000001ee25c3b8 in _pthread_wqthread ()

Steps

The crash is highly timing-dependent. To reproduce it reliably:

  1. Use an iOS device with Developer Settings enabled.

  2. Go to Developer > Network Link Conditioner -> High Latency DNS.

  3. Intercept a DNS query and perform a DoH (DNS-over-HTTPS) request using URLSession.

  4. The first few network requests should trigger the crash

Minimum Working Example (MWE)

class DNSProxyProvider: NEDNSProxyProvider {
    override func handleNewFlow(_ flow: NEAppProxyFlow) -> Bool {
        guard let udpFlow = flow as? NEAppProxyUDPFlow else { return false }
        
        Task(priority: .userInitiated) {
            await handleUDPFlow(udpFlow)
        }
        return true
    }
    
    func handleUDPFlow(_ flow: NEAppProxyUDPFlow) async {
        do {
            try await flow.open(withLocalFlowEndpoint: nil)
            
            while !Task.isCancelled {
                // Suspension point 1: Waiting for datagrams
                let (flowData, error) = await flow.readDatagrams()
                if let error { throw error }
                guard let flowData, !flowData.isEmpty else { return }
                
                var responses: [(Data, Network.NWEndpoint)] = []
                for (data, endpoint) in flowData {
                    // Suspension point 2: External DoH resolution
                    let response = try await resolveViaDoH(data)
                    responses.append((response, endpoint))
                }
                
                // Suspension point 3: Writing back to the flow
                // Extension will crash here on task resumption
                try await flow.writeDatagrams(responses)
            }
        } catch {
            flow.closeReadWithError(error)
            flow.closeWriteWithError(error)
        }
    }
    
    private func handleFlowData(_ packet: Data, endpoint: Network.NWEndpoint, using parameters: NWParameters) async throws -> Data {
        let url = URL(string: "https://dns.google/dns-query")!
        
        var request = URLRequest(url: url)
        request.httpMethod = "POST"
        request.httpBody = packet
        request.setValue("application/dns-message", forHTTPHeaderField: "Content-Type")
        
        let (data, _) = try await URLSession.shared.data(for: request)
        return data
    }
}

Crash Details & Analysis

The disassembly at the crash point indicates a null dereference of an internal executor pointer (Voucher context):

ldr x20, [TPIDRRO_EL0 + 0x340]
ldr x0, [x20, #0x28]   // x20 is NULL/0x0 here, resulting in address 0x28

It appears that NEAppProxyUDPFlow’s async methods bind the Task to a specific internal executor. When the kernel reclaims the flow memory, the pointer in x20 becomes invalid. Because the Swift runtime is unaware that the NEFlow queue executor has vanished, it attempts to resume on non-existing flow and then crashes.

Checking !Task.isCancelled does not prevent this, as the crash happens during the transition into the task body before the cancellation check can even run.

Questions

  1. Is this a known issue of the NetworkExtension async bridge?

  2. Why does Task.isCancelled not reflect the deallocation of the underlying NEAppProxyFlow?

  3. Is the only safe workaround?

Please feel free to correct me if I misunderstood anything here. I'll be happy to hear any insights or suggestions :) Thank you!

Minimum Working Example

Hmmm, that’s a minimum failing example, right? That is, you wrote this code specifically to reproduce the crash, right?

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Hmmm, that’s a minimum failing example, right? That is, you wrote this code specifically to reproduce the crash, right?

Yes, that's correct!

I used the wrong term earlier. The snippet is a minimum failing example, written specifically to reproduce the crash as simply as possible.

OK, cool. Well, not cool, but you know what I mean (-:

Given that you can reproduce this so easily, I recommend that you file a bug about it now. When doing that:

  • Make sure to enable additional VPN logging, per the VPN (Network Extension) instructions on our Bug Reporting > Profiles and Logs page.
  • After reproducing the problem, grab a sysdiagnose and attach it to your bug report.

Once you’re done, post your bug number here. My plan is then to grab the crash report from your sysdiagnose log and dig a bit deeper.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Given that you can reproduce this so easily, I recommend that you file a bug about it now. When doing that: Make sure to enable additional VPN logging, per the VPN (Network Extension) instructions on our Bug Reporting > Profiles and Logs page. After reproducing the problem, grab a sysdiagnose and attach it to your bug report. Once you’re done, post your bug number here. My plan is then to grab the crash report from your sysdiagnose log and dig a bit deeper.

Sounds good, thank you!

Filed report: FB21933607

In addition to the sysdiagnose with the VPN logging profile enabled, I also attached a Minimum Failing Example similar to the code sample in the first message.

Reproducible EXC_BAD_ACCESS in NEDNSProxyProvider when using async/await variants of NEAppProxyUDPFlow
 
 
Q