tstudent’s Profile | Apple Developer Forums

Loss of connectivity with DNS proxy and content filter extensions running at the same time

The environment: macOS Ventura 13.3.1 on a Mac mini (2018 model) A content filter extension (content-filter-provider), configured with filterSockets = YES, filterPackets = NO A DNS proxy extension (dns-proxy) An Ethernet network and two Wi-Fi access points OpenVPN The exact conditions to reproduce the issue are obscure. So far, we managed to narrow it down to: Connect to the Ethernet network Connect to Wi-Fi access point A, and wait a couple minutes Connect to Wi-Fi access point B (in our tests, this is a mobile hotspot; not sure if relevant) Establish an OpenVPN tunnel Put machine to sleep, and wait 15 minutes Disable Wi-Fi access point B, and wait 3 hours Wake up the machine The result is a complete loss of network connectivity: the machine can't even renew the DHCP lease The content filter logs new flows without blocking or pausing them. If a flow is missing some address information, it peeks 1 incoming byte or 1 outgoing byte, whichever comes first ([NEFilterNewFlowVerdict filterDataVerdictWithFilterInbound:YES peekInboundBytes:1 filterOutbound:YES peekOutboundBytes:1]), in the hope to get more complete flow metadata once data is being exchanged. Most importantly, if a flow looks like a DNS request over UDP, the content filter sniffs all incoming data, asking for at most 10000 bytes at a time ([NEFilterNewFlowVerdict filterDataVerdictWithFilterInbound:YES peekInboundBytes:10000 filterOutbound:YES peekOutboundBytes:1]). The DNS answer sniffer has already caused conflict with the DNS proxy in the past, but we fixed it. I don't know the DNS proxy in as much detail, but I know that it has both blacklisting and proxying features (it proxies requests to a CloudFlare API, I believe), so it's a little more complex than a simple pass-through like the content filter The last event logged by the content filter is, in fact, a DNS query whose answer is being sniffed. Then it stops logging anything, coinciding with the system-wide loss of connectivity (verified from the logs of other applications and services) Spindump shows that the content filter is idle, stuck in NetworkExtension internals, which in turn are stuck in an event wait (lck_mtx_sleep). Also of note, mDNSResponder and nesessionmanager are both stuck in a kernel mode call to soflow_db_apply, as a result of closeing a socket. In all cases, by "stuck", I mean that the thread was found to be executing that code in 1000 samples out of 1000, with no use of CPU time An anonymized subset of the log is attached: spindump-abridged.txt Are we doing anything obviously wrong? Any gotchas that we may have tripped onto in our extension implementations? Or is it possibly a system bug that we should open a radar for? Thank you

App & System Services Core OS macOS Network Extension

1.3k

Jul ’23

tstudent

Post

Replies

Boosts

Views

Activity