I'm mostly thinking of a Transparent Proxy Provider, as usual, but... how does one test it? I can't see how one would do it with unit tests (although you could break out code and test some of that code). Since it requires MDM or user approval, that makes automated tests a bit difficult. I have this monstrous vision of writing a program that loads the extension and invokes the appropriate methods on it but that just leads to other questions about subclasses.
I'm sure other people have thought about this and am curious what the thoughts are. 😄
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
I have
var idleScanTimer = DispatchSource.makeTimerSource()
as a class ivar. When the object is started, I have
self.idleScanTimer.schedule(deadline: .now(), repeating: Double(5.0*60))
(and it sets an event handler, that checks some times.)
When the object is stopped, it calls self.idleScanTimer.cancel().
At some point, the object containing it is deallocated, and ... sometimes, I think, not always, it crashes:
Crashed Thread: 61 Dispatch queue: NEFlow queue
[...]
Application Specific Information:
BUG IN CLIENT OF LIBDISPATCH: Release of an inactive object
[...]
Thread 61 Crashed:: Dispatch queue: NEFlow queue
0 libdispatch.dylib 0x7ff81c1232cd _dispatch_queue_xref_dispose.cold.2 + 24
1 libdispatch.dylib 0x7ff81c0f84f6 _dispatch_queue_xref_dispose + 55
2 libdispatch.dylib 0x7ff81c0f2dec -[OS_dispatch_source _xref_dispose] + 17
3 com.kithrup.simpleprovider 0x101df5fa7 MyClass.deinit + 87
4 com.kithrup.simpleprovider 0x101dfbdbb MyClass.__deallocating_deinit + 11
5 libswiftCore.dylib 0x7ff829a63460 _swift_release_dealloc + 16
6 com.kithrup.simpleprovider 0x101e122f4 0x101de7000 + 176884
7 libswiftCore.dylib 0x7ff829a63460 _swift_release_dealloc + 16
8 libsystem_blocks.dylib 0x7ff81bfdc654 _Block_release + 130
9 libsystem_blocks.dylib 0x7ff81bfdc654 _Block_release + 130
10 libdispatch.dylib 0x7ff81c0f3317 _dispatch_client_callout + 8
11 libdispatch.dylib 0x7ff81c0f9317 _dispatch_lane_serial_drain + 672
12 libdispatch.dylib 0x7ff81c0f9dfd _dispatch_lane_invoke + 366
13 libdispatch.dylib 0x7ff81c103eee _dispatch_workloop_worker_thread + 753
14 libsystem_pthread.dylib 0x7ff81c2a7fd0 _pthread_wqthread + 326
15 libsystem_pthread.dylib 0x7ff81c2a6f57 start_wqthread + 15
I tried changing it to an optional and having the deinit call .cancel() and set it to nil, but it still crashes.
I can't figure out how to get it deallocated in a small, standalone test program.
That's probably a bad title, let's try with specifics: we have a network extension, it has some classes / functions of its own, and they, when push comes to build, depend on (for example) NEAppProxyFlow and its subclasses. The code is written in Swift, since it is the language of the future.
If I want to do a unit test for my code, I need to provide something that at least looks like NEAppProxyFlow, since I can't otherwise create one. I thought I could provide my own NetworkExtension module for test case, but that... did not work well, and I still don't understand why.
On the other hand, I'm really bad at making unit tests, so the odds that I'm missing something fairly obvious to most other people are pretty high.
First, for the employees reading, I filed FB14844573 over the weekend, because this is a reproducible panic or hang. whee
I ran our stress tests for an entire long weekend, and my machine panicked, due to mbufs. Normally, I tell my coworkers that we can't really do anything to cause a panic -- but we're doing network things, so this is an exception. I started periodically testing the mbufs while the tests were running -- netstat -m | grep 'mbufs in use' -- and noticed that in fact they were going up, and never decreasing. Even if I killed our code and uninstalled the extensions. (They're increasing at about ~4mbufs/sec.)
Today I confirmed that this only happens if we include UDP packets:
let udpRule = NENetworkRule(destinationNetwork: host, prefix: 0, protocol: .UDP)
let tcpRule = NENetworkRule(destinationNetwork: host, prefix: 0, protocol: .TCP)
...
settings.includedNetworkRules = [udpRule, tcpRule]
If I comment out that udpRule, part, mbufs don't leak.
Our handleNewUDPFlow(:, initialRemoteEndpoint:) method checks to see if the application is a friendly one, and if so it returns false. If it isn't friendly, we want to block QUIC packets:
if let host = endpoint as? NWHostEndpoint {
if host.port == "80" || host.port == "443" {
// We need to open it and then close it
flow.open(withLocalEndpoint: nil) { error in
Self.workQueue.asyncAfter(deadline: .now() + 0.01) {
let err = error ?? POSIXError(POSIXErrorCode.ECONNABORTED)
flow.closeReadWithError(err)
flow.closeWriteWithError(err)
}
}
return true
}
}
return false
Has anyone else run into this? I can't see that it's my problem at that point, since the only thing we do with UDP flows is to either say "we don't want it, you handle it" or "ok sure, we'll take it but then let's close it immediately".
The archive build part works, and uses the correct entitlements file:
[Key] com.apple.developer.networking.networkextension
[Value]
[Array]
[String] app-proxy-provider-systemextension
That's from codesign -dv --entitlements - ...../NetworkExtensionExperiment.app
However, the distribution log shows
"Error Domain=DVTPortalProfileErrorDomain Code=4 \"Cannot create a Developer ID provisioning profile for \"com.kithrup.NetworkExtensionExperiment\".\" UserInfo={NSLocalizedDescription=Cannot create a Developer ID provisioning profile for \"com.kithrup.NetworkExtensionExperiment\"., IDEDistributionIssueSeverity=3, NSLocalizedRecoverySuggestion=The Network Extensions capability is not available for Developer ID provisioning profiles. Disable this feature and try again., NSUnderlyingError=0x600013e719b0 {Error Domain=DVTPortalProfileTypeErrorDomain Code=0 \"Cannot create a Developer ID provisioning profile.\" UserInfo={UnsupportedFeatureNames=(\n \"Network Extensions\"\n), NSLocalizedDescription=Cannot create a Developer ID provisioning profile., NSLocalizedRecoverySuggestion=The Network Extensions capability is not available for Developer ID provisioning profiles. Disable this feature and try again.}}}",
"Error Domain=IDEProfileLocatorErrorDomain Code=1 \"No profiles for 'com.kithrup.NetworkExtensionExperiment' were found\" UserInfo={IDEDistributionIssueSeverity=3, NSLocalizedDescription=No profiles for 'com.kithrup.NetworkExtensionExperiment' were found, NSLocalizedRecoverySuggestion=Xcode couldn't find any Developer ID provisioning profiles matching 'com.kithrup.NetworkExtensionExperiment'.}"
which, given that I was able to build a signed version with the entitlement as shown first, seems to be a problem.
All my years of hating xcode are coming back to haunt me, I can tell.
The proxy doesn't seem to have a way to tell if the application is trying to make an IPv4 or an IPv6 connection (unless the remote endpoint is an explicit IPv4 or IPv6 address). Am I missing something there, or is that in fact how it's intended to be?
Our TPP excludes our own processes from oversight, which makes some things very easy. Only I just found out that when our app uses a WKWebView... it's very securely shuffled off into its own process. With its own signing identifier. And a ppid of launchd.
How could I tell that a com.apple.WebKit.Networking process is related to our process? (I note that the Endpoint Security Framework has added a "responsible" audit token, presumably for this sort of situation.)
That's pretty much the question: we've got a tunnel provider, and I think the OS' ability to handle a captive portal situation is better than I could do, so is there a way to find out if we are in one, and if so wait for it to be handled by the user before we start doing things?
Our transparent proxy provider sends flows to a daemon which analyzes and then does proxying. Works fine.
Except that sometimes it stops working. As far as I can tell, it's due to DNS not working. Queries hang -- we've got some internal ones we log, that have timed out after 20 or 30 seconds. Now, clearly, we're doing something bad (because if we kill the daemon and it restarts, everything goes back to working).
Unfortunately, I have forgotten so much I can't figure out how to see where it's broken! Things like dig @8.8.8.8 com. any fail -- I am presuming because it's trying to do a lookup of "8.8.8.8" and that fails, but I could be wrong. Admittedly, that one doesn't time out, it simply says no servers could be reached. Meanwhile, pinging that address works. (And, also, the local DNS host -- the one provided via DHCP and listed in /etc/resolv.conf and ipconfig getstatus -- behaves the same way.)
I haven't been able to reproduce this myself, unfortunately. Although I have, somewhat interestingly, had a similar issue, which was clearly due to a Google Home WiFi access point (as resetting it fixed the problem, as does moving to another area of the house such that a different AP in the mesh takes over).
On my FreeBSD systems, I'd run tcpdump and truss/ktrace on named, but as I said, I've forgotten so much about how macOS does DNS I'm flailing.
Help?
consoleUser = SCDynamicstoreCopyConsoleUser(NULL, &uid, &gid);
the string is empty, but not NULL. uid and gid are set properly.
Any idea why this would happen? NB: it only happens from a LaunchAgent, for some reason; if I isolate the code in question, and run it via CLI, it works exactly as expected. And it only seems to happen for one person -- but for him, it happens on both Intel and Apple Silicon.
Coworkers are trying it and it's not working -- the google response says there was a problem with it, and not much else.
I do not have a yubikey (at least not yet 😄), and I'm really not good at the GUI stuff so I don't know as much about it as I probably should. Searching the fora here found a question and comment that didn't make a lot of sense to me, but again I admit to a lot of ignorance here.
So any pointers to where I should be look would be appreciated.
2024-06-04 15:17:59.618853+0100 ProxyAgent[20233:29237510] [xpc.exceptions] <NSXPCConnection: 0x60000331cb40> connection from pid 20227 on anonymousListener or serviceListener: Exception caught during decoding of received selector newFlowWithIdentifier:to:type:metadata:socket:, dropping incoming message.
Exception: Exception while decoding argument 4 (#6 of invocation):
<NSInvocation: 0x600001778780>
return value: {v} void
target: {@} 0x0
selector: {:} null
argument 2: {@} 0x6000017787c0
argument 3: {@} 0x60000002d170
argument 4: {q} 1
argument 5: {@} 0x600001746600
argument 6: {@} 0x0
Exception: decodeObjectForKey: Object of class "NSFileHandle" returned nil from -initWithCoder: while being decoded for key <no key>
The extension is in Swift; the recipient is in ObjC (wheeeeee).
Based on the extension's logging, the FileHandle is not nil.
I am trying to pass a FileHandle based on a socketpair up to the user-land code. The sockets are created happily.
Any ideas what's going wrong here?
I have this in my start code:
for p in [4500] + Array(3478...3497) + Array(16384...16387) + Array(16393...16402) {
// According to the documentation, I *should* be able to
// use "" for the hostname, and prefix:0, but it complained
// about the prefix length, so we use the top bit for ipv4
// and ipv6.
let port = "\(p)"
os_log(.debug, log: Self.log, "Setting up to exclude port %{public}s", port)
let host_1 = NWHostEndpoint(hostname:"0.0.0.0", port: port)
let host_2 = NWHostEndpoint(hostname:"255.0.0.0", port: port)
let host_3 = NWHostEndpoint(hostname:"0::0", port: port)
let host_4 = NWHostEndpoint(hostname:"ffff::0", port: port)
for host in [host_1, host_3] {
let udpPortRule = NENetworkRule(destinationNetwork: host, prefix:1, protocol: .UDP)
excludeRules.append(udpPortRule)
}
}
settings.excludedNetworkRules = excludeRules
This produces the log message
2024-07-23 11:16:38.335649+0100 0x901984 Debug 0x0 20686 0 com.kithrup.SimpleTPP.Provider: [com.kithrup:Provider] Setting up to exclude port 3483
Later on, when running, I log the new flows in handleNewUDPFlow(:,initialRemoteEndpoint:), and it produces
2024-07-23 11:17:05.712055+0100 0x901984 Debug 0x0 20686 0 com.kithrup.SimpleTPP.Provider: [com.kithrup:Provider] handleNewUDPFlow(_:initialRemoteEndpoint:): new UDP flow for host 17.252.13.7:3483 app com.apple.identityservicesd
So port 3483 is definitely in the excludedRules array, but it's not being excluded.
(All of this is because I still can't figure out why FaceTime isn't working with us.)
I'd like to be able to do the equivalent of getrusage(3) for some of our other processes. These are daemons, so they're not connected in any way. Obviously, Activity Monitor and top can do the things I want, but I'm not Apple. 😄
I went down a maze of twisty APIs, all a-Mach, and have decided to ask.
(We're trying to keep track of the processes in the field. We also want to know what's going on if a process has stopped responding but hasn't died. I suppose I could, absolute worst case, periodically send getrusage(3) info to the monitoring process.)
My little network extension is running out of file descriptors. My suspicion is that something in the Security framework is not being deallocated, although even this doesn't make a great deal of sense:
The extension looks at each flow, and gets a SecStaticCodeRef for it, finds the pathname, makes a decision, and stores the result of that decision in an NSCache<NSData, NSNumber> where the key is flow.metaData.sourceAppUniqueIdentifier. This goes through a couple layers of abstractions (the cache is in one Swift class, and it calls another Swift class that gets the security info and then returns the pathname, or throws an error).
As an example, after running for a couple of days, it has 1074 open file descriptors for /System/Library/PrivateFrameworks/CloudKitDaemon.framework/Support/cloudd -- and only had 732 three hours ago.