Post

Replies

Boosts

Views

Activity

Reply to Symbolicating kernel backtraces on Apple Silicon
Based on the symbolicated backtraces, there are no references to where the memory is being used after being freed, or where it's being freed. There's just this reference to where it's being allocated: ['atos', '-o', '/Users/user/panic/Kernel_Debug_Kit_15.6_build_24G84.dmg_extracted/KDK.pkg/Payload/System/Library/Kernels/kernel.release.t8103.dSYM/Contents/Resources/DWARF/kernel.release.t8103', '-arch', 'arm64e', '-textExecAddress', '0xfffffe0025c50000', '0xfffffe0025cae7c0', '0xfffffe0026558328', '0xfffffe002655e730', '0xfffffe0025d30f08', '0xfffffe0025cc21c0', '0xfffffe002620afe4', '0xfffffe00243f717c', '0xfffffe00243f7110', '0xfffffe00243df12c', '0xfffffe00243df610', '0xfffffe0024408bc0', '0xfffffe0025ee60cc', '0xfffffe0025ecdf08', '0xfffffe0025eceee4', '0xfffffe002635e9b8', '0xfffffe0025e0fff0', '0xfffffe0025c57b88'] panic_trap_to_debugger (in kernel.release.t8103) (debug.c:1400) kalloc_type_distribute_budget (in kernel.release.t8103) (kalloc.c:1343) kmem_init (in kernel.release.t8103) (vm_kern.c:4584) zone_early_scramble_rr (in kernel.release.t8103) (zalloc.c:3220) kalloc_ext (in kernel.release.t8103) (kalloc.c:2520) mprotect (in kernel.release.t8103) (kern_mman.c:1311) 0xfffffe00243f717c 0xfffffe00243f7110 0xfffffe00243df12c 0xfffffe00243df610 0xfffffe0024408bc0 vn_pathconf (in kernel.release.t8103) (vfs_vnops.c:1875) openat_dprotected_internal (in kernel.release.t8103) (vfs_syscalls.c:5127) mknod (in kernel.release.t8103) (vfs_syscalls.c:5469) sendsig (in kernel.release.t8103) (unix_signal.c:626) sleh_synchronous (in kernel.release.t8103) (sleh.c:1267) fleh_synchronous (in kernel.release.t8103) + 24 The unsymbolicated memory addresses come from my KEXT where memory is allocated via OSMalloc(). The object being allocated ends up being stored in an RB tree. I'd like to take you up on your offer to take a look at the full panic log for me if I may. Please find the full panic log attached to this ticket https://feedbackassistant.apple.com/feedback/21231833.
Topic: App & System Services SubTopic: Core OS Tags:
1w
Reply to Symbolicating kernel backtraces on Apple Silicon
Hello Kevin, I put together a Python script to do a full symbolication based on your instructions. I'm having trouble computing addresses of functions given in the actual stack frames. First off, the symbolication of the frame of the thread that panicked does succeed as the function addresses come precomputed by the system: % ./symbolicate.py TEXT_EXEC 0xfffffe00072d4000 TEXT 0xfffffe0007004000 ktext_exec_base 0xfffffe002c900000 load_address: 0xfffffe002c630000 panicked_thread_faddrs: ['0xfffffe002c95d93c', '0xfffffe002cacd124', '0xfffffe002cacb31c', '0xfffffe002c903b88', '0xfffffe002c95dc08', '0xfffffe002d264898', '0xfffffe002d2700f8', '0xfffffe002caccf80', '0xfffffe002cacb490', '0xfffffe002c903b88', '0xfffffe002cb7f820', '0xfffffe002cba8538', '0xfffffe002cb8fd00', '0xfffffe002cb905c0', '0xfffffe002d05eb5c', '0xfffffe002cacb3a4', '0xfffffe002c903b88', '0x19d82a2b0'] ['atos', '-o', 'Kernel_Debug_Kit_26_build_25A353.dmg_extracted/KDK.pkg/Payload/System/Library/Kernels/kernel.release.t8103.dSYM/Contents/Resources/DWARF/kernel.release.t8103', '-arch', 'arm64e', '-l', '0xfffffe002c630000', '0xfffffe002c95d93c', '0xfffffe002cacd124', '0xfffffe002cacb31c', '0xfffffe002c903b88', '0xfffffe002c95dc08', '0xfffffe002d264898', '0xfffffe002d2700f8', '0xfffffe002caccf80', '0xfffffe002cacb490', '0xfffffe002c903b88', '0xfffffe002cb7f820', '0xfffffe002cba8538', '0xfffffe002cb8fd00', '0xfffffe002cb905c0', '0xfffffe002d05eb5c', '0xfffffe002cacb3a4', '0xfffffe002c903b88', '0x19d82a2b0'] handle_debugger_trap (in kernel.release.t8103) (debug.c:1863) handle_uncategorized (in kernel.release.t8103) (sleh.c:1818) sleh_synchronous (in kernel.release.t8103) (sleh.c:1698) fleh_synchronous (in kernel.release.t8103) + 24 DebuggerTrapWithState (in kernel.release.t8103) (debug.c:830) Assert (in kernel.release.t8103) (debug.c:841) sleh_synchronous_sp1 (in kernel.release.t8103) (sleh.c:1191) handle_kernel_abort (in kernel.release.t8103) (sleh.c:3960) sleh_synchronous (in kernel.release.t8103) (sleh.c:1698) fleh_synchronous (in kernel.release.t8103) + 24 vn_create (in kernel.release.t8103) (vfs_subr.c:8079) vn_open_auth (in kernel.release.t8103) (vfs_vnops.c:483) open1 (in kernel.release.t8103) (vfs_syscalls.c:0) open_extended (in kernel.release.t8103) (vfs_syscalls.c:5273) unix_syscall (in kernel.release.t8103) (systemcalls.c:181) sleh_synchronous (in kernel.release.t8103) (sleh.c:1484) fleh_synchronous (in kernel.release.t8103) + 24 0x19d82a2b0 Here's the output generated for the very first kernel frame that fails to symbolicate: kernelFrames: tid:110 IOServiceTerminateThread UUID: 8502a040-9cf9-35f5-b8a2-84b0e48d379e [1, 622608] funcaddr: 0xfffffe0008774000+0x98010 -> 0xfffffe000880c010 [1, 617372] funcaddr: 0xfffffe0008774000+0x96b9c -> 0xfffffe000880ab9c [1, 509108] funcaddr: 0xfffffe0008774000+0x7c4b4 -> 0xfffffe00087f04b4 [1, 8496472] funcaddr: 0xfffffe0008774000+0x81a558 -> 0xfffffe0008f8e558 [1, 53004] funcaddr: 0xfffffe0008774000+0xcf0c -> 0xfffffe0008780f0c ['atos', '-o', 'Kernel_Debug_Kit_26_build_25A353.dmg_extracted/KDK.pkg/Payload/System/Library/Kernels/kernel.release.t8103.dSYM/Contents/Resources/DWARF/kernel.release.t8103', '-arch', 'arm64e', '-l', '0xfffffe002c630000', '0xfffffe000880c010', '0xfffffe000880ab9c', '0xfffffe00087f04b4', '0xfffffe0008f8e558', '0xfffffe0008780f0c'] 0xfffffe000880c010 0xfffffe000880ab9c 0xfffffe00087f04b4 0xfffffe0008f8e558 0xfffffe0008780f0c UUID: 8502a040-9cf9-35f5-b8a2-84b0e48d379e is that of the kernel as referenced in the panic log Kernel UUID: 8502A040-9CF9-35F5-B8A2-84B0E48D379E and given in the binaryImages lookup table located at index 1: "binaryImages": [ [ "fbe15ad4-ea36-3c07-81be-460a8240c1d4", 18446741874887413328, "T" ], [ "8502a040-9cf9-35f5-b8a2-84b0e48d379e", 18446741874828328960, "T" ], Function addresses are computed as follows: faddr=binaryImages[loadaddr]+kernelFrames[offset]. Given the diagnostics above [1, 622608] funcaddr: 0xfffffe0008774000+0x98010 -> 0xfffffe000880c010, where 0xfffffe0008774000 is the load address of the kernel 18446741874828328960, and 0x98010 is the offset from the load address 622608, atos(1) fails to perform symbolication. Your second suggestion to use the value of the offset+ the load address of 0 doesn't succeed either. Your clarifying how function addresses given in kernel stack frames are to be computed so they result in a successful symbolication would be greatly appreciated.
Topic: App & System Services SubTopic: Core OS Tags:
Oct ’25
Reply to Pinpointing dandling pointers in 3rd party KEXTs
Thanks for the link. I was able to build a bootable Kext Collection as instructed in the post you referenced. I was then able to boot into a KASAN instrumented kernel on my Apple Silicon machine. On reproducing a kernel panic via a UAF I got the following symbolication, which I didn't find useful in identifying the source of a UAF in my KEXT. % symbolicateKernelPanicBacktrace.sh ~/2025-09-24-114045.kernel.core.kasan.myfs.uninstrumented.log /System/Volumes/Data/Library/Developer/KDKs/KDK_12.5.1_21G83.kdk/System/Library/Kernels/kernel.kasan.t8101 ASCII text panic(cpu 2 caller 0xfffffe0024926790): KASan: UaF of quarantined object 0xfffffe167506f880 handle_debugger_trap (in kernel.kasan.t8101) (debug.c:1431) kdp_trap (in kernel.kasan.t8101) (kdp_machdep.c:363) sleh_synchronous (in kernel.kasan.t8101) (sleh.c:854) fleh_synchronous (in kernel.kasan.t8101) + 40 DebuggerTrapWithState (in kernel.kasan.t8101) (debug.c:662) panic_trap_to_debugger (in kernel.kasan.t8101) (debug.c:1074) Assert (in kernel.kasan.t8101) (debug.c:688) ubsan_json_init.cold.1 (in kernel.kasan.t8101) (ubsan.c:0) asan.module_ctor (in kernel.kasan.t8101) + 0 kasan_crash_report (in kernel.kasan.t8101) (kasan-report.c:136) kasan_violation (in kernel.kasan.t8101) (kasan-report.c:0) kasan_free_internal (in kernel.kasan.t8101) (kasan-classic.c:815) kasan_free (in kernel.kasan.t8101) (kasan-classic.c:843) kfree_zone (in kernel.kasan.t8101) (kalloc.c:2416) kfree_ext (in kernel.kasan.t8101) (kalloc.c:0) IOFree_internal (in kernel.kasan.t8101) (IOLib.cpp:360) __asan_global_.str.102 (in kernel.kasan.t8101) + 8 nx_netif_na_txsync (in kernel.kasan.t8101) (nx_netif.c:1682) netif_ring_tx_refill (in kernel.kasan.t8101) (nx_netif.c:4025) nx_netif_na_txsync (in kernel.kasan.t8101) (nx_netif.c:1708) netif_transmit (in kernel.kasan.t8101) (nx_netif.c:3770) nx_netif_host_output (in kernel.kasan.t8101) (nx_netif_host.c:0) dlil_output (in kernel.kasan.t8101) (dlil.c:6776) ip_output_list (in kernel.kasan.t8101) (ip_output.c:1626) tcp_ip_output (in kernel.kasan.t8101) (tcp_output.c:0) tcp_output (in kernel.kasan.t8101) (tcp_output.c:2713) tcp_input (in kernel.kasan.t8101) (tcp_input.c:0) ip_proto_dispatch_in (in kernel.kasan.t8101) (ip_input.c:0) ip_input (in kernel.kasan.t8101) (ip_input.c:0) proto_input (in kernel.kasan.t8101) (kpi_protocol.c:0) ether_inet_input (in kernel.kasan.t8101) (ether_inet_pr_module.c:221) dlil_ifproto_input (in kernel.kasan.t8101) (dlil.c:5696) dlil_input_packet_list_common (in kernel.kasan.t8101) (dlil.c:6121) dlil_input_thread_cont (in kernel.kasan.t8101) (dlil.c:3169) Call_continuation (in kernel.kasan.t8101) + 216 I also was able to instrument my KEXT as described in the Pishi project. But the instrument_kext.py Ghidra script ended up garbling the LR register in my KC by overwriting the two most significant bytes of the address: panic(cpu 7 caller 0xfffffe002d0984c0): Kernel data abort. at pc 0xfffffe002d46a9b0, lr 0xc8b2fe002d46a9ac (saved state: 0xfffffe3d227eed70) x0: 0x0000000000447ed0 x1: 0xfffffe0030f6d1c0 x2: 0x0000000000000000 x3: 0xfffffe1017381410 x4: 0x00000000000000fd x5: 0x0000000000000000 x6: 0xfffffe002b81606c x7: 0xfffffe3d227ee980 x8: 0x0000000000000000 x9: 0x64627135a6d70010 x10: 0x0000000000000005 x11: 0xfffffe1014d87e40 x12: 0xfffffe1014d7c000 x13: 0x0000000000000000 x14: 0x0000000000000000 x15: 0x0000000000000008 x16: 0x0000020077cba83c x17: 0xfffffe0030259920 x18: 0x0000000000000000 x19: 0x0000000000000000 x20: 0xfffffe167f157890 x21: 0xfffffe167f15617c x22: 0x00000000e00002bd x23: 0x0000000000000000 x24: 0x000000000014b4dc x25: 0x0000000000000000 x26: 0xfffffe167e137840 x27: 0x0000000000000000 x28: 0x0000000000000000 fp: 0xfffffe3d227ef140 lr: 0xc8b2fe002d46a9ac sp: 0xfffffe3d227ef0c0 pc: 0xfffffe002d46a9b0 cpsr: 0x60401208 esr: 0x96000006 far: 0x0000000000000000 Debugger message: panic Memory ID: 0x6 OS release type: User OS version: 21G83 Kernel version: Darwin Kernel Version 21.6.0: Wed Aug 10 14:28:18 PDT 2022; root:xnu_kasan-8020.141.5~2/KASAN_ARM64_T8101 This has just about exhausted my options in being able to locate the source of UAFs in my KEXT. If you have any other practical advice to offer, it would be greatly appreciated. How do kernel devs at Apple debug UAFs?
Topic: App & System Services SubTopic: Core OS Tags:
Sep ’25
Reply to Pinpointing dandling pointers in 3rd party KEXTs
Have you tried testing on Intel, either on real hardware or running in a VM? I haven't. I don't have access to an Intel-based Mac at the moment. And I don't have a VM readily available. The following write-up[1] on instrumenting KEXTs claims that being able to load a KASAN kernel doesn't mean that one's own KEXT would get instrumented as well. Is that an accurate statement? [1] https://r00tkitsmm.github.io/fuzzing/2025/04/10/Pishi2.html How reproducible is the issue? One issue is only reproducible when mounting my filesystem on an M4 Max CPU, but not on M1 or M2 machines I tried this on. Another UAF issue is reproducible universally though which manifests itself when attempting to open multiple PDFs at the same time.
Topic: App & System Services SubTopic: Core OS Tags:
Sep ’25
Reply to Hardlinks reported as non-existing on macOS Sequoia for 3rd party FS
Thanks for your feedback and suggestions. I double checked which vnode_t is returned by my vnop_lookup. It is indeed the one that references both the original file and the hardlink. vnop_lookup: cat-1235 vnop_lookup: -> lookuprpc(/hlf.txt) lookuprpc: cat-1235 lookuprpc: lookup successful entry exists lookuprpc: -> cache_lookup(/hlf.txt) lookuprpc: <- cache_lookup(/hlf.txt) -> -1 ;VFS_CACHE_HIT vnop_lookup: <- vp fffffe2a45cf6b60 /hlf.txt The vnop_open call that comes immediately after the previous vnop_lookup call is for the parent directory, not the file being returned by the previous lookup: vnop_open: zsh-570 vnop_open: vnode_isdir( root) -> 1 With no open vnop_open calls made afterwards. Here's a similar backtrace for the original file being looked up. vnop_lookup: cat-1236 vnop_lookup: -> lookuprpc(/f.txt) lookuprpc: cat-1236 lookuprpc: lookup successful entry exists lookuprpc: -> cache_lookup(/f.txt) lookuprpc: <- cache_lookup(/f.txt) -> -1 ;VFS_CACHE_HIT vnop_lookup: <- vp fffffe2a45cf6b60 /f.txt If my vnop_lookup returns the correct vnode_t for the file being looked up, what causes ENOENT to be returned to open(2) in userspace? How do we track it down?
Topic: App & System Services SubTopic: Core OS Tags:
Jul ’25
Reply to Symbolicating kernel backtraces on Apple Silicon
Further diagnostics have shown that ubc_msync(UBC_PUSHDIRTY|UBC_SYNC|UBC_INVALIDATE) was being called, for the file being written, through the VNOP_GETATTR path on each change of the file size while cluster_write() of the same file was in progress. The change I made now calls ubc_msync() only if there's no cluster_write() in progress when getting file attributes. To test the patch I extracted the zip file a few times in a row with the system no longer crashing and the following metrics being reported by vm_stat(1): % vm_stat 30 | awk '/Statistics/; !/Statistics/{print $3,"|",$13}' Mach Virtual Memory Statistics: (page size of 16384 bytes) specul | file-backed 165368 | 653350 65896 | 888301 66724 | 920760 65872 | 907261 63935 | 898648 67795 | 885978 63295 | 871778 59696 | 863204 3309 | 807539 188881 | 859100 60996 | 1018163 63041 | 1015598 59276 | 1013489 57392 | 1013524 60394 | 1013068 56646 | 1009342 634 | 953419 333949 | 1003401 59119 | 1132207 60277 | 1131932 56650 | 1128616 59147 | 1124974 56620 | 1124759 57165 | 1123408 55714 | 1122087 Thanks very much for all your help. Are you at liberty to disclose how you obtained the backtraces from the full panic long I'd supplied?
Topic: App & System Services SubTopic: Core OS Tags:
May ’25
Reply to Symbolicating kernel backtraces on Apple Silicon
I was able to collect some diagnostics which revealed an interesting pattern of things that were happening prior to the kernel panic occurring. Depending on the amount of memory available, one or more files are able to get extracted from the zip file before the panic occurs. Calls being made cluster_write() gets called repeatedly until all file data is stored in VM. Then VNOP_PAGEOUT gets called. I thought that VNOP_PAGEOUT was only called for mmapped files. But there are no VNOP_MMAP/VNOP_MNOMAP calls being made for the files being extracted. The current VNOP_PAGEOUT implementation ends up calling err_pageout() for files that didn't get tagged as being memory mapped by VNOP_MMAP. After a number of VNOP_PAGEOUT calls, VNOP_STRATEGY is called which commits file data to remote storage via a do_write() rpc. This pattern is repeated until the system ultimately runs out of VM with kernel panic ensuing. My intuition is telling me that those VNOP_PAGEOUT calls are being made for a reason. Possibly, to free up the memory pages used up by the cluster API. I tried calling cluster_pageout() from VNOP_PAGEOUT despite VNOP_MMAP never being called. But that resulted in VNOP_STRATEGY being called through two separate paths, VNOP_PAGEOUT and cluster API, which resulted in I/O stalling. Any further pointers would be much appreciated.
Topic: App & System Services SubTopic: Core OS Tags:
May ’25
Reply to Symbolicating kernel backtraces on Apple Silicon
Your description of the write operation logic my filesystem implements is fairly close. I thought I'd feel in a few blanks though. write() called. VNOP_WRITE gets called. Input data gets copied from user space into kernel space. Input data equal to the size of uio_resid() gets split up into smaller chunks and is iteratively transmitted over network until there's no more input data left with sock_send(MSG_DONTWAIT) returning the number of the bytes written in sentlen, which is equal to the value of uio_resid() of the chunk passed into sock_send(). VNOP_WRITE returns once all input data has been transmitted. write() returns. Not sure about the 'driver finishes uploading data at some point' part though. Or were you referring to the TCP stack queueing up the input data after sock_send(MSG_DONTWAIT) returns for subsequent transmission? At some point you need to decide to NOT return from "VNOP_WRITE" and instead block Without first understanding what's using up all that VM, I can't see how such a point may be determined. How would VNOP_WRITE block? For some reason, IOLog() data doesn't wind up in the output of log show|stream on macOS 15.5 running on M4 Max. I've added debug print statements to output buffer sizes and memory (de)allocation for uio_t buffers used to store the chunks of input data. I'll run the zip file extraction while collecting the diagnostics via log stream on an M2 Mac where logging is working and see if that reveals anything of interest.
Topic: App & System Services SubTopic: Core OS Tags:
May ’25
Reply to Symbolicating kernel backtraces on Apple Silicon
BTW, how much memory does this machine have? Is it 32g or 16g? It's 36Gb. % expr `sysctl -n hw.memsize` / 1024 / 1024 / 1024 36 The bottom line here is that you ultimately need to constrain how quickly you complete write calls to be roughly in line with your underlying ability to transfer data. My attempt at this was through modifying the socket polling algorithm. The decision on whether it's a read or a write event that's occurred is taken in the sock_upcall callback. The socket's receive buffer is queried about whether there's any data available via a call to sock_getsockopt(SO_NREAD). If there is, a thread waiting to read data is sent a read event wakeup() call. Otherwise, the algorithm considers this to be a write event and a thread waiting to write data is sent a wakeup() call. This doesn't take into account whether or not there's room in the send buffer. It's a 'poor' man's socket polling algo, due to functions like sock_setupcalls(), soreadable(), and sowriteable() being a part of the private API. To try and bring the rate the data is being written in line with the rate it's being transferred, I tried modifying the write part of the algorithm by considering it a write event only if there's no more data left in the socket's send buffer via a call to sock_getsockopt(SO_NWRITE). That didn't help remedy the problem. I can't think of other ways of achieving this at the moment.
Topic: App & System Services SubTopic: Core OS Tags:
May ’25
Reply to Symbolicating kernel backtraces on Apple Silicon
Does the panic happen when you're just the source and/or destination or only when your "both"? I suspect it will happen when you're just the destination and won't happen when you''re just the source, but I'd like to confirm that. You were right in assuming that the panic occurred when my filesystem was the destination. I was able to verify that. How fast is the I/O path back to the server? Are you saturating that connection? The connection is not likely to be saturated as this is a 100Gb Link on a thunderbolt 5 interface. Is the I/O pipeline here simply "read compressed file from network-> decompress data-> write data out to network"? Or is there any intermediary in that process? The I/O pipeline is as you described it with no intermediary involved. What actually makes it "back" to the server before the panic occurs? How much data were you actually able to write? On two subsequent runs, around 41-42 Gb out of 64Gb of data were written before the panic ensued. du -smx ./25116_CINEGRAPHER_MARCCAIN 41303 ./25116_CINEGRAPHER_MARCCAIN du -smx ./25116_CINEGRAPHER_MARCCAIN 42600 ./25116_CINEGRAPHER_MARCCAIN How does your write process actually "work"? Is there anything that would limit/constrain how much data you have pending (waiting to be sent over the network)? The source uio_t buffer passed into vnop_write() is userspace. Before passing it down to sock_send() which operates on kernel resident memory buffers, we create a kernelspace copy of the userspace uio_t buffer of size equal to uio_resid(uspace_uio) with copying performed by uiomove() incrementally in chunks equal to either the value of the amount of data left in the userspace buffer or the value of the kernel's copysize_limit_panic, whichever happens to be the smaller of the two. The kernelspace uio_t buffer is further split up into smaller chunks of data pertinent to the filesystem design which end up being passed into sock_send(). Reading is done in a similar fashion, the only difference being the use of sock_receive_mbuf() in stead of sock_receive() which uses an uio_t buffer instead of an mbuf. I'm onto the debugging strategies you suggested now. I'll report back on my findings as they emerge. Thanks once again for all your help. Hopefully, we'll be able to resolve this soon.
Topic: App & System Services SubTopic: Core OS Tags:
May ’25
Reply to Symbolicating kernel backtraces on Apple Silicon
Thanks very much for looking into this. Here are the points you wanted clarified. How much space is available on the boot volume and is there ANY possibility That it's becoming constrained? That doesn't seem to be the case. See Boot volume size info what's the content you're actually decompressing? One (or several) large files or a bunch of small ones? It is a bunch of files of varying sizes. Here's a List of zip file contents Is your file system the source or the target for the zip file? It's actually both. what's actually providing the storage for your file system? Is it possible that some supporting component has failed? It is a network filesystem with I/O happening via RPC calls. Internally, the filesystem implements a socket polling algorithm that relies on the use of sock_upcall callback that's passed into sock_socket() on socket creation. The backing store on the remote server has been verified to have ample disk space available. Do let me know if there's anything else I can provide that would help identify the cause of the problem.
Topic: App & System Services SubTopic: Core OS Tags:
May ’25