Post

Replies

Boosts

Views

Activity

XCode 16 clang++ compiler generates unexpected results for conditional checks at -O2 and -O3 optimization levels
Around a month back, developers of the OpenJDK project, when using XCode 16 to build the JDK started noticing odd failures when executing code which was compiled using the clang++ compiler shipped in that XCode 16 release (details in https://bugs.openjdk.org/browse/JDK-8340341). Specifically, a trivial for loop in a c++ code of the form: int limit = ... // method local variable for (i=0; i<limit; i++) { ... } ends up iterating more times than the specified limit. The "i<limit" returns true even when it should have returned false. In fact, debug log messages within the for loop of the form: fprintf(stderr, "parsing %d of %d, %d < % d == %s", i, limit, i, limit, (i<limit) ? "true" : "false"); would show output of the form: parsing 0 of 2, 0 < 2 == true parsing 1 of 2, 1 < 2 == true parsing 2 of 2, 2 < 2 == true Notice, how it entered the for loop even when 2 < 2 should have prevented it from entering it. Furthermore, notice the message says 2 < 2 == true (which clearly isn't right). This happens when that code is compiled with optimization level -O2 or -O3. The issue doesn't happen with -O1. I had reported this as an issue to Apple through feedback assistance, more than a month back. The feedback id is FB15162411. There hasn't been any response to it nor any indication that the issue has been noticed and can be reproduced (the steps to reproduce have been provided in that issue). In the meantime, more and more users are now running into this failure in JDK when using XCode 16. We haven't put any workaround in place (the only workaround we know of is using -O1 for the compilation of this file) because it isn't clear what exactly is causing this issue (other than the fact that it shows up with specific optimization levels). It's also unknown if this bug has wider impact. Would it be possible to check if FB15162411 is being looked into and any technical details on what's causing this? That would help us decide if it's OK to put in place a temporary workaround in the OpenJDK build and how long to maintain that workaround. For reference, this was reproduced on: clang++ --version Apple clang version 16.0.0 (clang-1600.0.26.3) Target: arm64-apple-darwin23.6.0 Thread model: posix InstalledDir: /xcode-16/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
13
1
2.1k
Dec ’24
BSD socket APIs and macOS entitlements
I am looking for inputs to better understand MacOS entitlements. I ask this in context of OpenJDK project, which builds and ships the JDK. The build process makes uses of make tool and thus doesn't involving building through the XCode product. The JDK itself is a Java language platform providing applications a set of standard APIs. The implementation of these standard APIs internally involves calling platform specific native library functions. In this discussion, I would like to focus on the networking functions that the implementation uses. Almost all of these networking functions and syscalls that the internal implementation uses are BSD socket related. Imagine calls to socket(), connect(), getsockopt(), setsockopt(), getaddrinfo(), sendto(), listen(), accept() and several such. The JDK that's built through make is then packaged and made available for installation. The packaging itself varies, but for this discussion, I'll focus on the .tar.gz archived packaging. Within this archive there are several executables (for example: java, javac and others) and several libraries. My understanding, based on what I have read of MacOS entitlements is that, the entitlements are set on the executable and any libraries that would be loaded and used by that executable will be evaluated against the entitlements of the executable (please correct me if I misunderstand). Reading through the list of entitlements noted here https://developer.apple.com/documentation/bundleresources/entitlements, the relevant entitlements that an executable (like "java") which internally invokes BSD socket related syscalls and library functions, appear to be: com.apple.security.network.client - https://developer.apple.com/documentation/bundleresources/entitlements/com.apple.security.network.client com.apple.security.network.server - https://developer.apple.com/documentation/bundleresources/entitlements/com.apple.security.network.server com.apple.developer.networking.multicast - https://developer.apple.com/documentation/bundleresources/entitlements/com.apple.developer.networking.multicast Is my understanding correct that these are the relevant ones for MacOS? Are there any more entitlements that are of interest? Would it then mean that the executables (java for example) would have to enroll for these entitlements to be allowed to invoke those functions at runtime? Reading through https://developer.apple.com/documentation/bundleresources/entitlements, I believe that even when an executable is configured with these entitlements, when the application is running if that executable makes use of any operations for which it has an entitlement, the user is still prompted (through a UI notification) whether or not to allow the operation. Did I understand it right? The part that isn't clear from that documentation is, if the executable hasn't been configured with a relevant entitlement, what happens when the executable invokes on such operation. Will the user see a UI notification asking permission to allow the operation (just like if an entitlement was configured)? Or does that operation just fail in some behind the scenes way? Coming back to the networking specific entitlements, I found a couple of places in the MacOS documentation where it is claimed that the com.apple.developer.networking.multicast entitlement is only applicable on iOS. In fact, the entitlement definition page for it https://developer.apple.com/documentation/bundleresources/entitlements/com.apple.developer.networking.multicast says: "Your app must have this entitlement to send or receive IP multicast or broadcast on iOS. It also allows your app to browse and advertise arbitrary Bonjour service types." Yet, that same page, a few lines above, shows "macOS 10.0+". So, is com.apple.developer.networking.multicast entitlement necessary for an executable running on MacOS which deals with multicasting using BSD sockets? As a more general comment about the documentation, I see that the main entitlements page here https://developer.apple.com/documentation/bundleresources/entitlements categorizes some of these entitlements under specific categories, for example, notice how some entitlements are categorized under "App Clips". I think it would be useful if there was a category for "BSD sockets" and under that it would list all relevant entitlements that are applicable, even if it means repeating the entitlement names across different categories. I think that will make it easier to identify the relevant entitlements. Finally, more as a long term question, how does one watch or keep track of these required entitlements for these operations. What I mean is, is it expected that application developers keep visiting the macos documentation, like these pages, to know that a new entitlement is now required in a new macos (update) release? Or are there other ways to keep track of it? For example, if a newer macos requires a new entitlement, then when (an already built) executable is run on that version of macos, perhaps generate a notification or some kind of explicit error which makes it clear what entitlement is missing? I have read through https://developer.apple.com/documentation/bundleresources/diagnosing-issues-with-entitlements but that page focuses on identifying such issues when a executable is being built and doesn't explain the case where an executable has already been shipped with X entitlements and a new Y entitlement is now required to run on a newer version of macos.
13
0
670
Mar ’25
macos 15.3.x local network restrictions leading to EHOSTUNREACH "No route to host"
Continuing with my investigations of several issues that we have been noticing in our testing of the JDK with macosx 15.x, I have now narrowed down at least 2 separate problems for which I need help. For a quick background, starting with macosx 15.x several networking related tests within the JDK have started failing in very odd and hard to debug ways in our internal lab. Reading through the macos docs and with help from others in these forums, I have come to understand that a lot of these failures are to do with the new restrictions that have been placed for "Local Network" operations. I have read through https://developer.apple.com/documentation/technotes/tn3179-understanding-local-network-privacy and I think I understand the necessary background about these restrictions. There's more than one issue in this area that I will need help with, so I'll split them out into separate topics in this forum. That above doc states: macOS 15.1 fixed a number of local network privacy bugs. If you encounter local network privacy problems on macOS 15.0, retest on macOS 15.1 or later. We did have (and continue to have) 15.0 and 15.1 macos instances within our lab which are impacted by these changes. They too show several networking related failures. However, I have decided not to look into those systems and instead focus only on 15.3.1. People might see unexpected behavior in System Settings > Privacy & Security if they have multiple versions of the same app installed (FB15568200). This feedback assistant issue and several others linked in these documentations are inaccessible (even when I login with my existing account). I think it would be good to have some facility in the feedback assistant tool/site to make such issues visible (even if read-only) to be able to watch for updates to those issues. So now coming to the issue. Several of the networking tests in the JDK do mulicasting testing (through BSD sockets API) in order to test the Java SE multicasting socket API implementations. One repeated failure we have been seeing in our labs is an exception with the message "No route to host". It shows up as: Process id: 58700 ... java.net.NoRouteToHostException: No route to host at java.base/sun.nio.ch.DatagramChannelImpl.send0(Native Method) at java.base/sun.nio.ch.DatagramChannelImpl.sendFromNativeBuffer(DatagramChannelImpl.java:914) at java.base/sun.nio.ch.DatagramChannelImpl.send(DatagramChannelImpl.java:871) at java.base/sun.nio.ch.DatagramChannelImpl.send(DatagramChannelImpl.java:798) at java.base/sun.nio.ch.DatagramChannelImpl.blockingSend(DatagramChannelImpl.java:857) at java.base/sun.nio.ch.DatagramSocketAdaptor.send(DatagramSocketAdaptor.java:178) at java.base/java.net.DatagramSocket.send(DatagramSocket.java:593) (this is just one example stacktrace from java program) That "send0" is implemented by the JDK by invoking the sendto() system call. In this case, the sendto() is returning a EHOSTUNREACH error which is what is then propagated to the application. The forum text editor doesn't allow me to post long text, so I'm going to post the rest of this investigation and logs as a reply.
9
0
662
Mar ’25
Instructions for debugging recent macos kernel versions?
Is there any recent and a bit authoritative documentation which explains how to debug recent versions of macos kernel? I have found some blog posts from other users but those are either outdated or don't work for some other reason. I am guessing kernel debugging is pretty common for developers working on macos itself, so I'm hoping someone in this forum would have some working instructions for that.
9
1
311
Oct ’25
What is the command to list all socket filters/extensions in use?
I am in the middle of investigating an issue arising in the call to setsockopt syscall where it returns an undocumented and unexpected errno. As part of that, I'm looking for a way to list any socket content filters or any such extensions are in play on the system where this happens. To do that, I ran: systemextensionsctl list That retuns the following output: 0 extension(s) which seems to indicate there's no filters or extensions in play. However, when I do: netstat -s among other things, it shows: net_api: 2 interface filters currently attached 2 interface filters currently attached by OS 2 interface filters attached since boot 2 interface filters attached since boot by OS ... 4 socket filters currently attached 4 socket filters currently attached by OS 4 socket filters attached since boot 4 socket filters attached since boot by OS What would be the right command/tool/options that I could use to list all the socket filters/extensions (and their details) that are in use and applicable when a call to setsockopt is made from an application on that system? Edit: This is on a macosx-aarch64 with various different OS versions - 13.6.7, 14.3.1 and even 14.4.1.
8
0
816
Aug ’25
Out-of-band data returned by recv() and read() on socket bound to non-loopback address even when SO_OOBINLINE is disabled
I've been investigating an issue with the SO_OOBINLINE socket option. When that option is disabled, the expectation is that out-of-band data that is sent on the socket will not be available through the use of read() or recv() calls on that socket. What we have been noticing is that when the socket is bound to a non-loopback address (and the communication is happening over that non-loopback address), then even when SO_OOBINLINE is disabled for the socket, the read()/recv() calls both return the out-of-band data. The issue however isn't reproducible with loopback address, and read()/recv() both correctly exclude the out-of-band data. This issue is only noticed on macos. I have been able to reproduce on macos M1, following version, but the original report which prompted me to look into this was reported on macos x64. My M1 OS version is: sw_vers ProductName: macOS ProductVersion: 14.3.1 BuildVersion: 23D60 Attached is a reproducer (main.c.txt - rename it to main.c after downloading) that I have been able to develop which reproduces this issue on macos. When you compile and run that: ./a.out it binds to a non-loopback address by default and you should see the failure log, resembling: ... ERROR: expected: 1234512345 but received: 12345U12345 To run the same reproducer against loopback address, run it as: ./a.out loopback and that should succeed (i.e. no out-of-band data) with logs resembling: ... SUCCESS: completed successfully, expected: 1234512345, received: 1234512345 Is this a bug in the OS? I would have reported this directly through feedback assistant, but my past few open issues (since more than a year) have not even seen an acknowledgement or a reply, so I decided to check here first. main.c.txt
7
0
937
May ’24
UDP socket bind with ephemeral port on macos results in OS allocating a already bound/in-use port
We have been observing an issue where when binding a UDP socket to an ephemeral port (i.e. port 0), the OS ends up allocating a port which is already bound and in-use. We have been seeing this issue across all macos versions we have access to (10.x through recent released 13.x). Specifically, we (or some other process) create a udp4 socket bound to wildcard and ephemeral port. Then our program attempts a bind on a udp46 socket with ephemeral port. The OS binds this socket to an already in use port, for example you can see this netstat output when that happens: netstat -anv -p udp | grep 51630 udp46 0 0 *.51630 *.* 786896 9216 89318 0 00000 00000000 00000000001546eb 00000000 00000800 1 0 000001 udp4 0 0 *.51630 *.* 786896 9216 89318 0 00000 00000000 0000000000153d9d 00000000 00000800 1 0 000001 51630 is the (OS allocated) port here, which as you can see has been allocated to 2 sockets. The process id in this case is the same (because we ran an explicit reproducer to reproduce this), but it isn't always the case. We have a reproducer which consistenly shows this behaviour. Before filing a feedback assistant issue, I wanted to check if this indeed appears to be an issue or if we are missing something here, since this appears to be a very basic thing.
6
1
1.5k
Jul ’24
Process with equal instances but unequal identities
I am looking at some logs that I collected through sysdiagnose and I notice several messages of the form: ... fault 2025-03-05 01:12:04.034832 +0000 runningboardd Two equal instances have unequal identities. <anon<java>(502) pid=86764 AUID=502> and <anon<java>(502)(0) pid=86764> fault 2025-03-05 01:15:05.829696 +0000 runningboardd Two equal instances have unequal identities. <anon<java>(502) pid=88001 AUID=502> and <anon<java>(502)(0) pid=88001> fault 2025-03-05 01:15:06.047003 +0000 runningboardd Two equal instances have unequal identities. <anon<java>(502) pid=88010 AUID=502> and <anon<java>(502)(0) pid=88010> fault 2025-03-05 01:15:06.385648 +0000 runningboardd Two equal instances have unequal identities. <anon<java>(502) pid=88012 AUID=502> and <anon<java>(502)(0) pid=88012> fault 2025-03-05 01:15:07.135896 +0000 runningboardd Two equal instances have unequal identities. <anon<java>(502) pid=88019 AUID=502> and <anon<java>(502)(0) pid=88019> fault 2025-03-05 01:15:07.491316 +0000 runningboardd Two equal instances have unequal identities. <anon<java>(502) pid=88021 AUID=502> and <anon<java>(502)(0) pid=88021> fault 2025-03-05 01:15:07.542102 +0000 runningboardd Two equal instances have unequal identities. <anon<java>(502) pid=88022 AUID=502> and <anon<java>(502)(0) pid=88022> fault 2025-03-05 01:15:07.803126 +0000 runningboardd Two equal instances have unequal identities. <anon<java>(502) pid=88025 AUID=502> and <anon<java>(502)(0) pid=88025> fault 2025-03-05 01:15:59.774214 +0000 runningboardd Two equal instances have unequal identities. <anon<java>(502) pid=88568 AUID=502> and <anon<java>(502)(0) pid=88568> fault 2025-03-05 01:16:00.142288 +0000 runningboardd Two equal instances have unequal identities. <anon<java>(502) pid=88572 AUID=502> and <anon<java>(502)(0) pid=88572> fault 2025-03-05 01:16:00.224019 +0000 runningboardd Two equal instances have unequal identities. <anon<java>(502) pid=88573 AUID=502> and <anon<java>(502)(0) pid=88573> fault 2025-03-05 01:16:01.180670 +0000 runningboardd Two equal instances have unequal identities. <anon<java>(502) pid=88580 AUID=502> and <anon<java>(502)(0) pid=88580> fault 2025-03-05 01:16:01.879884 +0000 runningboardd Two equal instances have unequal identities. <anon<java>(502) pid=88588 AUID=502> and <anon<java>(502)(0) pid=88588> fault 2025-03-05 01:16:02.233165 +0000 runningboardd Two equal instances have unequal identities. <anon<java>(502) pid=88589 AUID=502> and <anon<java>(502)(0) pid=88589> ... What's strange is that each of the message seems to say that it has identified two instances with unequal identities and yet it prints the same process for each such message. Notice: fault 2025-03-05 01:16:02.233165 +0000 runningboardd Two equal instances have unequal identities. <anon<java>(502) pid=88589 AUID=502> and <anon<java>(502)(0) pid=88589> I suspect the identity it is talking about is the one explained as designated requirement here https://developer.apple.com/documentation/Technotes/tn3127-inside-code-signing-requirements#Designated-requirement. Yet the message isn't clear why the same process would have two different identities. The type of this message is "fault", so I'm guessing that this message is pointing to some genuine issue with the executable of the process. Is that right? Any inputs on what could be wrong here? This is from a 15.3.1 macosx aarch64 system. On that note, is runningboardd the process which is responsible for these identity checks?
6
0
411
Mar ’25
MacOS 12.4 Beta 2 - Change in behaviour of connect() system call for datagram sockets for IPv4-mapped IPv6 addresses
The man page of "connect" states (snippet below): ".... datagram sockets may use connect() multiple times to change their association. Datagram sockets may dissolve the association by calling disconnectx(2), or by connecting to an invalid address, such as a null address ..." Up until 12.3.1 (and even 12.4 Beta1), if a connect() call was used on a datagram socket to disconnect an IPv4-mapped IPv6 address, the call used to complete and return back "EADDRNOTAVAIL". Internally it would indeed dissolve the current association of the connected datagram socket (i.e. subsequent calls to "send" would return back a "EDESTADDRREQ" errno). However, starting 12.4 Beta2, if a connect() call is made on such a datagram socket which is connected to a IPv4-mapped IPv6 address, the call now returns a "EINVAL" errno. Do note that this change in behaviour is happening in this Beta release only with IPv4-mapped IPv6 addresses. Other addresses like plain IPv6 address, continue to return "EADDRNOTAVAIL" on such connect() calls which are used to dissolve the current association. We have been able to reproduce this issue with a trivial C program that is attached to this discussion. The attached program, creates a datagram socket which then connects to a IPv4-mapped IPv6 loopback address. This connect is expected to succeed and does succeed. Once the connect is done, a next connect() is issued on the connected datagram socket, this time passing it a null address (as noted in the man page) and this call is expected to dissolve the connection association. The program expects a EADDRNOTAVAIL to be returned by this call (since that's what was being returned in all MacOS releases so far) and if some other errno gets returned then the program fails. When the attached reproducer is run against 12.3.1 (or other previous MacOS versions or even 12.4 Beta1), the program gets the expected EADDRNOTAVAIL and when it is run against 12.4 Beta2, it gets the unexpected EINVAL errno. Here's a sample output from 12.3.1 (the "successful" one): $> sw_vers ProductName: macOS ProductVersion: 12.3.1 BuildVersion: 21E258 $> clang main.c $> ./a.out created a socket, descriptor=3 trying to bind to addr ::ffff:127.0.0.1:0, addr family=30 Bound socket to ::ffff:127.0.0.1:46787, addr family=30 created another socket, descriptor=4 connecting to ::ffff:127.0.0.1:46787, addr family=30 successfully connected Disconnecting using addr :::0, addr family=30 got errno 49, which is considered OK for a connect to null address (a.k.a disconnect) Successfully disconnected Output from 12.4 Beta2 (the case where the behaviour has changed): $> sw_vers ProductName: macOS ProductVersion: 12.4 BuildVersion: 21F5058e $> clang main.c $> ./a.out created a socket, descriptor=3 trying to bind to addr ::ffff:127.0.0.1:0, addr family=30 Bound socket to ::ffff:127.0.0.1:23015, addr family=30 created another socket, descriptor=4 connecting to ::ffff:127.0.0.1:23015, addr family=30 successfully connected Disconnecting using addr :::0, addr family=30 Got unexpected errno 22 disconnect failed: Invalid argument Notice how the call to connect() to dissolve the connected association now returns errno 22 == Invalid Argument. We had a look at the release notes here https://developer.apple.com/documentation/macos-release-notes/macos-12_4-release-notes but they appear to be very high level release notes and we couldn't find anything relevant related to this change. So: Is this an intentional change? Are applications expected to account for this errno to be returned when calling connect() to dissolve a connected association of a datagram socket? If this is intentional, is there a reason why this is specific only for IPv4-mapped IPv6 address? The man connect page also talks about using disconnectx as an alternative way to dissolve the association. This appears to be specific to MacOS systems (couldn't find it in BSD man pages for example). Is disconnectx the recommended way to do the dissociation? P.S: I created a issue for this through the bug/feedback reporting tool (issue id: FB9996296), but there hasn't been any response to it since almost a week. So creating this thread in the forums hoping to get some inputs. main.c
5
0
1.4k
May ’22
Need help changing the correct/accepted answer on a forum thread
In this thread https://developer.apple.com/forums/thread/705281 (which I am the original poster), I have marked my own post as the accepted answer. I did that before I got input from Quinn on how to achieve what I was after. I have posted that information in a subsequent post in that thread here https://developer.apple.com/forums/thread/705281?answerId=712746022#712746022. Can someone please mark this as the correct answer? I tried but I don't see any option to do it myself. P.S: In some forum threads, I see that things like these are sorted through direct messages sometimes. But I can't see any place in my profile where I can send a direct message from. Does that feature require some level of forum points?
5
0
1.2k
May ’22
NSProcessInfo operatingSystemVersion generates warning CFPropertyListCreateFromXMLData(): Old-style plist parser: missing semicolon in dictionary
Consider this very trivial code which accesses the operatingSystemVersion property of NSProcessInfo as documented at https://developer.apple.com/documentation/foundation/nsprocessinfo/1410906-operatingsystemversion osversion.c: #include <Foundation/Foundation.h> int main(int argc, char *argv[]) { NSOperatingSystemVersion osVersion = [[NSProcessInfo processInfo] operatingSystemVersion]; fprintf(stderr, "OS version: %ld.%ld.%ld\n", osVersion.majorVersion, osVersion.minorVersion, osVersion.patchVersion); } Compile it: /usr/bin/clang -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.1.sdk -iframework /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.1.sdk/System/Library/Frameworks -x objective-c -o a.out -framework Foundation osversion.c Then run it: ./a.out It works fine and prints the OS version: OS version: 14.6.1 Run it again and pass it some arbitrary program arguments: ./a.out foo bar Still continues to work fine and prints the output: OS version: 14.6.1 Now run it again and this time pass it two program arguments, the first one being - and the second one being something of the form {x=y} ./a.out - {x=y} This time notice how it prints a couple of warning logs from CFPropertyListCreateFromXMLData before printing the output: 2024-10-11 11:18:03.584 a.out[61327:32412190] CFPropertyListCreateFromXMLData(): Old-style plist parser: missing semicolon in dictionary on line 1. Parsing will be abandoned. Break on _CFPropertyListMissingSemicolon to debug. 2024-10-11 11:18:03.585 a.out[61327:32412190] CFPropertyListCreateFromXMLData(): Old-style plist parser: missing semicolon in dictionary on line 1. Parsing will be abandoned. Break on _CFPropertyListMissingSemicolon to debug. OS version: 14.6.1 As far as I can see there's nothing wrong in the code nor the user inputs to the program. Is this some issue in the internal implementation of NSProcessInfo? Should this be reported as an issue through feedback assistant (which category)? Although this example was run on 14.6.1 of macos, the issue is reproducible on older versions too.
5
0
753
Oct ’24
macOS 26 Beta - man page of sw_vers is not accurate
A few minutes back I filed a feedback assistant issue for this (FB18173706), but I am not sure I filed it in the correct category and I can't find a way to edit it either. So posting this message here just to have to assigned in the right category if appropriate. The issue is as follows. On macOS 26 Tahoe Beta, "man sw_vers" has this among other details: Previous versions of sw_vers respected the SYSTEM_VERSION_COMPAT environment variable to provide compatibility fallback versions for scripts which did not support the macOS 11.0+ version transition. This is no longer supported, versions returned by sw_vers will always reflect the real system version. It says that SYSTEM_VERSION_COMPAT is no longer supported. That doesn't look right, because running sw_vers as follows on macOS 26 Beta results in: SYSTEM_VERSION_COMPAT=1 sw_vers ProductName: macOS ProductVersion: 16.0 BuildVersion: 25A5279m i.e. setting the environment variable SYSTEM_VERSION_COMPAT=1 results in sw_vers reporting the version as 16.0. Now try with SYSTEM_VERSION_COMPAT=0, and the result is: SYSTEM_VERSION_COMPAT=0 sw_vers ProductName: macOS ProductVersion: 26.0 BuildVersion: 25A5279m notice the output says 26.0. So it appears that SYSTEM_VERSION_COMPAT is supported even on macOS 26. I think the man page requires an update to match this behaviour.
5
0
187
Aug ’25
Debug logs for UDP communication?
We have some extensive tests which exercise UDP communication. Some of these tests fail fairly often due to the UDP packet being dropped by the kernel (or related reasons). These tests use loopback interface for communication. I have been looking to see if there's a way to pinpoint or narrow down exactly why a particular packet was dropped by the kernel. Looking at the kernel code, like here https://github.com/apple-opensource/xnu/blob/master/bsd/netinet/udp_usrreq.c#L1463 it appears that there are log message that get written out during some of this communication. However, looking at what KERNEL_DEBUG stands for, it appears that it's: /* * Traced only on debug kernels. */ #define KDBG_DEBUG(x, ...) KDBG_(_DEBUG, x, ## __VA_ARGS__, 4, 3, 2, 1, 0) So I don't think these logs get generated in a regular release build of the OS. Are there any other ways we can generate similar logs or any other tools that will give a clearer picture of why the packet might be drop?
4
0
1.3k
Jun ’22
syslogd - out-of-box bsd_out module sends UDP packets to non-existent destination socket?
We have been noticing some mysterious port binds on our macos setups, where the syslogd process binds to a ephemeral port on UDP. This socket isn't bound from the time syslogd process starts, but something/ some event triggers this bind. So we investigated further. It appears that one of the macos specific modules in syslogd is the "bsd_out" module which reads the config rules from a file called "/etc/syslog.conf". The contents of that file on my setup are: cat /etc/syslog.conf # Note that flat file logs are now configured in /etc/asl.conf install.* @127.0.0.1:32376 These contents are the default ones shipped in macos and nothing has been edited/changed. So it appears that the bsd_out module has been configured with a rule to send logs/messages in the "install" facility to be forwarded to some process which has a socket listening on loopback's 32376 port. Whenever some software gets installed/uninstalled from the machine, it looks like a log message gets generated which falls under this "install" facility and then the bsd_out module binds a socket for UDP and uses that socket to send the data to 127.0.0.1:32376. You will notice that before installing/uninstalling any software the command: sudo lsof -p &lt;syslogd-pid&gt; will not list any UDP port. As soon as you install/uninstall something that socket gets bound and is visible in the output of the above command. The (bound) socket stays around. The curious part is there's still no one/nothing that listens on that 32376 port. So it appears that this module is sending some datagrams that are just lost and not delivered? Is there a reason why the /etc/syslog.conf has this rule if there's nothing that's receiving that data? The "man syslogd" page does state that bsd_out module is only there for backward compatibility, so perhaps this config rule in /etc/syslog.conf is just a left over that is no longer relevant? I'm on macos 13.2.1: sw_vers ProductName: macOS ProductVersion: 13.2.1 BuildVersion: 22D68 but this has been noticed on older version (even 10.15.x) too. To reproduce, here are the steps: Find the pid of syslogd (ps -aef | grep syslogd) Find the resources used by this process including ports (sudo lsof -p &lt;syslog-pid&gt;) At this point, ideally, you shouldn't see any UDP ports being used by this process Install/uninstall any software (for example: move to trash and delete any installed application) Run the lsof command again (sudo lsof -p &lt;syslog-pid&gt;), you will now see that it uses a UDP port bound to INADDR_ANY address and an ephemeral port: syslogd 12345 root 11u IPv4 0xf557ad678c99264b 0t0 UDP *:56972 netstat output too will show that port (for example: netstat -anv -p UDP)
4
0
1.4k
Mar ’23
sendto() system call - Nondeterministic "No route to host" due to local network restrictions
Please consider this trivial C code which deals with BSD sockets. This will illustrate an issue with sendto() which seems to be impacted by the recent "Local Network" restrictions on 15.3.1 macos. #include <stdio.h> #include <stdlib.h> #include <netinet/in.h> #include <arpa/inet.h> #include "sys/socket.h" #include <string.h> #include <unistd.h> #include <ifaddrs.h> #include <net/if.h> // prints out the sockaddr_in6 void print_addr(const char *msg_prefix, struct sockaddr_in6 sa6) { char addr_text[INET6_ADDRSTRLEN] = {0}; printf("%s%s:%d, addr family=%u\n", msg_prefix, inet_ntop(AF_INET6, &sa6.sin6_addr, (char *) &addr_text, INET6_ADDRSTRLEN), sa6.sin6_port, sa6.sin6_family); } // creates a datagram socket int create_dgram_socket() { const int fd = socket(AF_INET6, SOCK_DGRAM, 0); if (fd < 0) { perror("Socket creation failed"); return -1; } return fd; } // returns a string representing the current local time char *current_time() { time_t seconds_since_epoch; time(&seconds_since_epoch); char *res = ctime(&seconds_since_epoch); const size_t len = strlen(res); // strip off the newline character that's at the end of the ctime() output res[len - 1] = '\0'; return res; } // Creates a datagram socket and then sends a messages (through sendto()) to a valid // multicast address. This it does two times, to the exact same destination address from // the exact same socket. // // Between the first and the second attempt to sendto(), there is // a sleep of 1 second. // // The first time, the sendto() succeeds and claims to have sent the expected number of bytes. // However system logs (generated through "log collect") seem to indicate that the message isn't // actually sent (there's a "cfil_service_inject_queue:4466 CFIL: sosend() failed 65" in the logs). // // The second time the sendto() returns a EHOSTUNREACH ("No route to host") error. // // If the sleep between these two sendto() attempts is removed then both the attempts "succeed". // However, the system logs still suggest that the message isn't actually sent. int main() { printf("current process id:%ld parent process id: %ld\n", (long) getpid(), (long) getppid()); // valid multicast address as specified in // https://www.iana.org/assignments/ipv6-multicast-addresses/ipv6-multicast-addresses.xhtml const char *ip6_addr_str = "ff01::1"; struct in6_addr ip6_addr; int rv = inet_pton(AF_INET6, ip6_addr_str, &ip6_addr); if (rv != 1) { fprintf(stderr, "failed to parse ipv6 addr %s\n", ip6_addr_str); exit(EXIT_FAILURE); } // create a AF_INET6 SOCK_DGRAM socket const int sock_fd = create_dgram_socket(); if (sock_fd < 0) { exit(EXIT_FAILURE); } printf("created a socket, descriptor=%d\n", sock_fd); const int dest_port = 12345; // arbitrary port struct sockaddr_in6 dest_sock_addr; memset((char *) &dest_sock_addr, 0, sizeof(struct sockaddr_in6)); dest_sock_addr.sin6_addr = ip6_addr; // the target multicast address dest_sock_addr.sin6_port = htons(dest_port); dest_sock_addr.sin6_family = AF_INET6; print_addr("test will attempt to sendto() to destination host:port -> ", dest_sock_addr); const char *msg = "hello"; const size_t msg_len = strlen(msg) + 1; for (int i = 1; i <= 2; i++) { if (i != 1) { // if not the first attempt, then sleep a while before attempting to sendto() again int num_sleep_seconds = 1; printf("sleeping for %d second(s) before calling sendto()\n", num_sleep_seconds); sleep(num_sleep_seconds); } printf("%s attempt %d to sendto() %lu bytes\n", current_time(), i, msg_len); const size_t num_sent = sendto(sock_fd, msg, msg_len, 0, (struct sockaddr *) &dest_sock_addr, sizeof(dest_sock_addr)); if (num_sent == -1) { fprintf(stderr, "%s ", current_time()); perror("sendto() failed"); close(sock_fd); exit(EXIT_FAILURE); } printf("%s attempt %d of sendto() succeeded, sent %lu bytes\n", current_time(), i, num_sent); } return 0; } What this program does is, it uses the sendto() system call to send a message over a datagram socket to a (valid) multicast address. It does this twice, from the same socket to the same target address. There is a sleep() of 1 second between these two sendto() attempts. Copy that code into noroutetohost.c and compile: clang noroutetohost.c Then run: ./a.out This generates the following output: current process id:58597 parent process id: 21614 created a socket, descriptor=3 test will attempt to sendto() to destination host:port ->ff01::1:14640, addr family=30 Fri Mar 14 20:34:09 2025 attempt 1 to sendto() 6 bytes Fri Mar 14 20:34:09 2025 attempt 1 of sendto() succeeded, sent 6 bytes sleeping for 1 second(s) before calling sendto() Fri Mar 14 20:34:10 2025 attempt 2 to sendto() 6 bytes Fri Mar 14 20:34:10 2025 sendto() failed: No route to host Notice how the first call to sendto() "succeeds", even the return value (that represents the number of bytes sent) matches the number of bytes that were supposed to be sent. Then notice how the second attempt fails with a EHOSTUNREACH ("No route to host") error. Looking through the system logs, it appears that the first attempt itself has failed: 2025-03-14 20:34:09.474797 default kernel cfil_hash_entry_log:6082 <CFIL: Error: sosend_reinject() failed>: [58597 a.out] <UDP(17) out so 891be95f3a70c605 22558774573152560 22558774573152560 age 0> lport 0 fport 12345 laddr :: faddr ff01::1 hash 1003930 2025-03-14 20:34:09.474806 default kernel cfil_service_inject_queue:4466 CFIL: sosend() failed 65 (notice the time on that log messages, they match the first attempt from the program's output log) So even though the first attempt failed, it never got reported back to the application. Then after sleeping for (an arbitrary amount of) 1 second, the second call fails with the EHOSTUNREACH. The system logs don't show any error (at least not the one similar to that previous one) for the second call. If I remove that sleep() between those two attempts, then both the sendto() calls "succeed" (and return the expected value for the number of bytes sent). However, the system logs show that the first call (and very likely even the second) has failed with the exact same log message from the kernel like before. If I'm not wrong then this appears to be some kind of a bug in the "local network" restrictions. Should this be reported? I can share the captured logs but I would prefer to do it privately for this one. Another interesting thing in all this is that there's absolutely no notification to the end user (I ran this program from the Terminal) about any of the "Local Network" restrictions.
4
0
437
Mar ’25