Post

Replies

Boosts

Views

Activity

Swift, XPC, and... segmentation faults?
I thought Swift wasn't supposed to get them, which is part of the reason why I chose to use it for my network extension. But we're getting crashes occasionally, that look like: Thread 4 Crashed::  Dispatch queue: com.apple.NSXPCConnection.user.endpoint 0   com.kithrup.MyApp.NExt                  0x102c4ffe2 MyExt.sendData(_:data:completion:) + 610 1   com.kithrup.MyApp.NExt                  0x102c5091f @objc MyExt.sendData(_:data:completion:) + 255 2   Foundation                              0x7ff81ef97490 __NSXPCCONNECTION_IS_CALLING_OUT_TO_EXPORTED_OBJECT_S3__ + 10 3   Foundation                              0x7ff81ef3fa1f -[NSXPCConnection _decodeAndInvokeMessageWithEvent:flags:] + 2322 4   Foundation                              0x7ff81eef641e message_handler + 206 5   libxpc.dylib                            0x7ff81de24b6c _xpc_connection_call_event_handler + 56 6   libxpc.dylib                            0x7ff81de23947 _xpc_connection_mach_event + 1382 7   libdispatch.dylib                       0x7ff81df2e3b1 _dispatch_client_callout4 + 9 8   libdispatch.dylib                       0x7ff81df47041 _dispatch_mach_msg_invoke + 445 9   libdispatch.dylib                       0x7ff81df341cd _dispatch_lane_serial_drain + 342 10  libdispatch.dylib                       0x7ff81df47b77 _dispatch_mach_invoke + 484 11  libdispatch.dylib                       0x7ff81df341cd _dispatch_lane_serial_drain + 342 12  libdispatch.dylib                       0x7ff81df34e30 _dispatch_lane_invoke + 417 13  libdispatch.dylib                       0x7ff81df3eeee _dispatch_workloop_worker_thread + 753 14  libsystem_pthread.dylib                 0x7ff81e0e1fd0 _pthread_wqthread + 326 The XPC method is func sendData(_: UUID, data: Data?, completion: @escaping (_: Error?) -> Void) It's crashing on address 0x10, so pretty clearly a NULL-dereference. Since this is happening in my extension, it's in Swift (as I said above), so I have no idea what could be NULL without the compiler yelling at me first.
11
0
2.4k
Jun ’22
Why does spotlight hate me?
This query should find everything with a display name of "Safari." That should include, for example, /Applications/Safari.app. [bigbook:/tmp] sef% mdfind 'kMDItemDisplayName == "Safari"c' /Library/Application Support/Apple/Safari /Library/Apple/System/Library/Assistant/Plugins/Safari.assistantBundle/Contents/MacOS/Safari /Users/Shared/Previously Relocated Items 1/Security/System/Library/AssetsV2/com_apple_MobileAsset_MacSoftwareUpdate/f7b05c91052116c046919f72de2c03a86cabcf3e.asset/AssetData/payloadv2/ecc_data/System/Library/Templates/Data/Applications/Safari.app /Users/Shared/Previously Relocated Items/Security/Developer/SDKs/MacOSX10.6.sdk/System/Library/PrivateFrameworks/Safari.framework/Versions/A/Safari /Users/Shared/Previously Relocated Items/Security/Developer/SDKs/MacOSX10.7.sdk/System/Library/PrivateFrameworks/Safari.framework/Versions/A/Safari /Users/sef/Applications/Microsoft Office 2004/Office/Themes/safari /Users/sef/Library/Application Support/SyncService/LastSync Data/Safari And yet, /Applications/Safari.app is in fact missing from there. Why? (This used to work. But then mds was broken on my machine, so I bit the bullet and upgraded to Monterey. Multiple Monterey systems are showing this weird behaviour.)
1
0
808
Jun ’22
Very basic question: diagnosing DNS issues
Our transparent proxy provider sends flows to a daemon which analyzes and then does proxying. Works fine. Except that sometimes it stops working. As far as I can tell, it's due to DNS not working. Queries hang -- we've got some internal ones we log, that have timed out after 20 or 30 seconds. Now, clearly, we're doing something bad (because if we kill the daemon and it restarts, everything goes back to working). Unfortunately, I have forgotten so much I can't figure out how to see where it's broken! Things like dig @8.8.8.8 com. any fail -- I am presuming because it's trying to do a lookup of "8.8.8.8" and that fails, but I could be wrong. Admittedly, that one doesn't time out, it simply says no servers could be reached. Meanwhile, pinging that address works. (And, also, the local DNS host -- the one provided via DHCP and listed in /etc/resolv.conf and ipconfig getstatus -- behaves the same way.) I haven't been able to reproduce this myself, unfortunately. Although I have, somewhat interestingly, had a similar issue, which was clearly due to a Google Home WiFi access point (as resetting it fixed the problem, as does moving to another area of the house such that a different AP in the mesh takes over). On my FreeBSD systems, I'd run tcpdump and truss/ktrace on named, but as I said, I've forgotten so much about how macOS does DNS I'm flailing. Help?
5
0
509
Jul ’22
malloc_history never works for me: unable to read input graph: The data couldn’t be read because it isn’t in the correct format
root# malloc_history /tmp/stack-logs.60147.10f5f7000.agent-tests.0EDkOu.index -callTree malloc_history[60193]: [fatal] unable to read input graph: The data couldn’t be read because it isn’t in the correct format. I ran my program as root# env MallocDebugReport=stderr MallocGuardEdges=1 MallocStackLogging=1 MallocStackLoggingNoCompact=1 MallocScribble=1 MallocErrorAbort=1 DYLD_INSERT_LIBRARIES=/usr/lib/libgmalloc.dylib ./test/agent-test (The program then segfaults, which looks to be due to a memory stomper.)
1
0
894
Sep ’23
On reboot, two instances of faceless app
We have a containing app for our network extension; it's set up as a faceless app and run as a LaunchAgent. It works rather well, we're happy with it. Except sometimes, possibly only on M1's, on reboot, it'll show up twice. Our name in the plist is com.kithrup.appName -- simple enough. On reboot, launchctl list shows two com.kithrup jobs -- and the extra one is application.com.kithrup.appName.3238445.3238450. Anyone have any idea about this?
8
0
982
Sep ’22
Pointer Authentication and dispatch_queue_t
We got a crash in some code, I had managed to miss this topic entirely somehow. This says: Pointer authentication can also expose latent bugs in existing code. In C++, it’s incorrect to call a virtual method using a declaration that differs from its definition. In practice, such calls typically succeed in arm64, but trigger a pointer authentication failure in arm64e. You might encounter this bug when using OS_OBJECT types like dispatch_queue_t and xpc_connection_t. You can’t pass instances of these types from C++ code to an Objective-C++ function (or vice versa) because they’re defined differently in Objective-C++ to support automatic reference counting (ARC). and, yes, we have both C++ and ObjC++ code, and a class does have a dispatch_queue_t member, and it does get passed around (although I don't think anything other than ObjC++ code touches the member), but... the documentation there says "you can't d this" but has absolutely no information on what you are supposed to do instead. Again, I've managed to miss this completely, and my network searching ability is pretty awful, so I assume I simply couldn't find documentation on it? (And I can't stream video very well where I am right now.)
6
0
1.4k
Dec ’22
Transparent proxy provider and multiple users
This is somewhat to my question at On reboot, two instances of faceless app - but slightly different focus. This is my understanding of how the system works, and please correct me if I'm wrong: A network extension can only be loaded by an application That application must contain the extension (in Contents/Library/SystemExtensions) Only the application instance that loads an extension can get VPN notifications (eg, NEVPNStatusDidChangeNotification) There does not appear to be a way to get the version of installed network extensions programmatically? When a second user logs in, and runs the containing app, and requests loading the extension, it does the normal replacement request. Given that... how is it supposed to handle multiple users (via Fast User Switching)?
3
0
719
Sep ’22
SCDynamicstoreCopyConsoleUser returns an empty string
consoleUser = SCDynamicstoreCopyConsoleUser(NULL, &uid, &gid); the string is empty, but not NULL. uid and gid are set properly. Any idea why this would happen? NB: it only happens from a LaunchAgent, for some reason; if I isolate the code in question, and run it via CLI, it works exactly as expected. And it only seems to happen for one person -- but for him, it happens on both Intel and Apple Silicon.
5
0
1.2k
Sep ’22
Network Extension installation and multiple users
We have a network extension. It is bundled in an app, that is launched as a launch agent for each user. When doing the install, the installer bootstraps the agent for each currently-logged-in console user. When the agent runs, it checks to see if it is the current active console user, and if so, goes through the process of activating the extension. This part works fine. But... if the installation is done while two users [haven't tried more than 2, sorry] are simultaneously logged in, SysPrefs gets launched for both users. Is this expected behaviour?
4
0
792
Oct ’22
Getting the pid of a network extension
Yes, actual process ID: on upgrades, our network extension sometimes decides to become completely incommunicado as far as XPC is concerned -- any attempt to send an XPC message to it results in "couldn't communicate with a helper application" or similar. The only workaround I've been able to come up with is unloading and reloading the extension. It was suggested that I try killing it. Which, great, but... how would I get it's pid? I do not at all feel comfortable launching pkill; I could get all the processes on the system and look for the name. But is there a way for the wrapping process to be able to get the pid?
4
0
709
Oct ’22
Transparent network proxy ... stops?
I don't know how to go forward on this one: we have a test engineer who can, reliably, cause networking to simply stop working. Our app has 3 major components -- a proxy daemon, a containing UI app, and a network extension. Because I am lousy at using debuggers, the extension logs every single new flow it gets (to .debug), as well as a bunch more. When our engineer gets this problem, the proxy may crash a couple of times, but is still running; the extension is also still running, but no longer gets new flows. Networking outside the machine no longer works. But doing echo foo | nc 127.0.0.1 88 succeeds (or, at least, doesn't print any error -- and also doesn't get any log messages from the extension). I've got a sysdiagnose from it, as well as a bunch of logs, and all I can really see is that the proxy app restarted, and when it came back, it said there was no networking available. And that the extension stopped logging new flows at about the same time. I have not been able to reproduce this -- even though our engineer is using the same script I wrote to try to reproduce it, and he can, within an hour. (As opposed to my systems, which have been running for almost a day on both an M1 and Intel system.) Any ideas of things I should try looking for in the sysdiagnose?
2
0
1.1k
Nov ’22