I'm debugging the following kernel panic to do with my custom filesystem KEXT:
panic(cpu 0 caller 0xfffffe004cae3e24): [kalloc.type.var4.128]: element modified after free (off:96, val:0x00000000ffffffff, sz:128, ptr:0xfffffe2e7c639600)
My reading of this is that somewhere in my KEXT I'm holding a reference 0xfffffe2e7c639600 to a 128 byte zone that wrote 0x00000000ffffffff at offset 96 after that particular chunk of memory had been released and zeroed out by the kernel.
The panic itself is emitted when my KEXT requests the memory chunk that's been tempered with via the following set of calls.
zalloc_uaf_panic()
__abortlike
static void
zalloc_uaf_panic(zone_t z, uintptr_t elem, size_t size)
{
...
(panic)("[%s%s]: element modified after free "
"(off:%d, val:0x%016lx, sz:%d, ptr:%p)%s",
zone_heap_name(z), zone_name(z),
first_offs, first_bits, esize, (void *)elem, buf);
...
}
zalloc_validate_element()
static void
zalloc_validate_element(
zone_t zone,
vm_offset_t elem,
vm_size_t size,
zalloc_flags_t flags)
{
...
if (memcmp_zero_ptr_aligned((void *)elem, size)) {
zalloc_uaf_panic(zone, elem, size);
}
...
}
The panic is triggered if memcmp_zero_ptr_aligned(), which is implemented in assembly, detects that an n-sized chunk of memory has been written after being free'd.
/* memcmp_zero_ptr_aligned() checks string s of n bytes contains all zeros.
* Address and size of the string s must be pointer-aligned.
* Return 0 if true, 1 otherwise. Also return 0 if n is 0.
*/
extern int
memcmp_zero_ptr_aligned(const void *s, size_t n);
Normally, KASAN would be resorted to to aid with that.
The KDK README states that KASAN kernels won't load on Apple Silicon.
Attempting to follow the instructions given in the README for Intel-based machines does result in a failure for me on Apple Silicon.
I stumbled on the Pishi project.
But the custom boot kernel collection that gets created doesn't have any of the KEXTs that were specified to kmutil(8) via the --explicit-only flag, so it can't be instrumented in Ghidra.
Which is confirmed as well by running:
% kmutil inspect -B boot.kc.kasan
boot kernel collection at /Users/user/boot.kc.kasan
(AEB8F757-E770-8195-458D-B87CADCAB062):
Extension Information:
I'd appreciate any pointers on how to tackle UAFs in kernel space.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
After perusing the sources of Apple's SMB and NFS clients' implementation of VNOP_MONITOR, my understanding of how VNOP_MONITOR+vnode_notify() operate is as follows:
A user-space process advertises an interest in monitoring a file or directory via kqueue(2)/kevent(2).
VFS calls the filesystem's implementation of VNOP_MONITOR.
VNOP_MONITOR forwards the commencing or terminating of monitoring events request to the filesystem server.
Network filesystem client nodes call vnode_notify() to notify the underlying VFS of a filesystem event, e.g. file/directory creation/removal, etc.
What I'm still vague about is how does the server communicate back to client nodes that an event of interest has occurred?
I'd appreciate being enlightened on the operation of `VNOP_MONITOR+vnode_notify()' in a network filesystem setting.
After creating a hardlink on a distributed filesystem of my own via:
% ln f.txt hlf.txt
Neither the original file, f.txt, nor the hardlink, hlf.txt, are immediately accessible, e.g. via cat(1) with ENOENT returned. A short time later though, both the original file and the hardlink are accessible. Both files can be stat(1)ed though, which confirms that vnop_getattr returns success for both files.
Dtruss(1) indicates it's the open(2) syscall that fails:
% sudo dtruss -f cat hlf.txt
2038/0x4f68: open("hlf.txt\0", 0x0, 0x0) = -1 Err#2 ;ENOENT
2038/0x4f68: write_nocancel(0x2, "cat: \0", 0x5) = 5 0
2038/0x4f68: write_nocancel(0x2, "hlf.txt\0", 0x7) = 7 0
2038/0x4f68: write_nocancel(0x2, ": \0", 0x2) = 2 0
2038/0x4f68: write_nocancel(0x2, "No such file or directory\n\0", 0x1A) = 26 0
Dtrace(1)ing my KEXT no longer works on macOS Sequoia, so based on the diagnostics print statements I inserted into my KEXT, the following sequence of calls is observed:
vnop_lookup(hlf.txt) -> EJUSTRETURN ;ln(1)
vnop_link(hlf.txt) -> KERN_SUCCESS ;ln(1)
vnop_lookup(hlf.txt) -> KERN_SUCCESS ;cat(1)
vnop_open(/) ; I expected to see vnop_open(hlf.txt) here instead of the parent directory.
Internally, hardlinks are created in vnop_link via a call to vnode_setmultipath with cache_purge_negatives called on the destination directory.
On macOS Monterey for example, where the same code does result in hardlinks being accessible, the following calls are made:
vnop_lookup(hlf.txt) -> EJUSTRETURN ;ln(1)
vnop_link(hlf.txt) -> KERN_SUCCESS ;ln(1)
vnop_lookup(hlf.txt) -> KERN_SUCCESS ;cat(1)
vnop_open(hlf.txt) -> KERN_SUCCESS ;cat(1)
Not sure how else to debug this.
Perusing the kernel sources for uses of VISHARDLINK, VNOP_LINK and vnode_setmultipath call sites did not clear things up for me.
Any pointers would be greatly appreciated.
A filesystem of my own making exibits the following undesirable behaviour.
ClientA
% echo line1 >>echo.txt
% od -Ax -ctx1 echo.txt
0000000 l i n e 1 \n
6c 69 6e 65 31 0a
0000006
ClientB
% od -Ax -ctx1 echo.txt
0000000 l i n e 1 \n
6c 69 6e 65 31 0a
0000006
% echo line2 >>echo.txt
% od -Ax -ctx1 echo.txt
0000000 l i n e 1 \n l i n e 2 \n
6c 69 6e 65 31 0a 6c 69 6e 65 32 0a
000000c
ClientA
% od -Ax -ctx1 echo.txt
0000000 l i n e 1 \n l i n e 2 \n
6c 69 6e 65 31 0a 6c 69 6e 65 32 0a
000000c
% echo line3 >>echo.txt
ClientB
% echo line4 >>echo.txt
ClientA
% echo line5 >>echo.txt
ClientB
% od -Ax -ctx1 echo.txt
0000000 l i n e 1 \n l i n e 2 \n l i n e
6c 69 6e 65 31 0a 6c 69 6e 65 32 0a 6c 69 6e 65
0000010 3 \n l i n e 4 \n \0 \0 \0 \0 \0 \0
33 0a 6c 69 6e 65 34 0a 00 00 00 00 00 00
000001e
ClientA
% od -Ax -ctx1 echo.txt
0000000 l i n e 1 \n l i n e 2 \n l i n e
6c 69 6e 65 31 0a 6c 69 6e 65 32 0a 6c 69 6e 65
0000010 3 \n \0 \0 \0 \0 \0 \0 l i n e 5 \n
33 0a 00 00 00 00 00 00 6c 69 6e 65 35 0a
000001e
ClientB
% od -Ax -ctx1 echo.txt
0000000 l i n e 1 \n l i n e 2 \n l i n e
6c 69 6e 65 31 0a 6c 69 6e 65 32 0a 6c 69 6e 65
0000010 3 \n \0 \0 \0 \0 \0 \0 l i n e 5 \n
33 0a 00 00 00 00 00 00 6c 69 6e 65 35 0a
000001e
The first write on clientA is done via the following call chain:
vnop_write()->vnop_close()->cluster_push_err()->vnop_blockmap()->vnop_strategy()
The first write on clientB first does a read, which is expected:
vnop_write()->cluster_write()->vnop_blockmap()->vnop_strategy()->myfs_read()
Followed by a write:
vnop_write()->vnop_close()->cluster_push_err()->vnop_blockmap()->vnop_strategy()
The final write on clientA calls cluster_write(), which doesn't do that initial read before doing a write.
I believe it is this write that introduces the hole.
What I don't understand is why this happens and how this may be prevented.
Any pointers on how to combat this would be much appreciated.
I am able to symbolicate kernel backtraces for addresses that belong to my kext.
Is it possible to symbolicate kernel backtraces for addresses that lie beyond my kext and reference kernel code?
Sample kernel panic log
Implementing ACL support in a distributed filesystem, with macOS and Linux clients talking to a remote file server, requires compatibility between the ACL models supported in Darwin-XNU and Linux kernels to be taken into consideration.
My filesystem does support EAs to facilitate ACL storage and retrieval.
So setting ACLs via chmod(1) and retrieving them via ls(1) does work.
However, the macOS and Linux ACL models are incompatible and would require some sort of conversion between them.
chmod(1) uses acl(3) to create ACL entries.
While acl(3) claims to implement POSIX.1e ACL security API, which, to the best of my knowledge, Linux VFS implements as well, their respective implementations of the standard obviously do differ. Which is also stated in acl(3):
This implementation of the POSIX.1e library differs from the standard in a number of non-portable ways in order to support the MacOS/Darwin ACL semantic.
Then there's this NFSv4 to POSIX ACL mapping draft that describes the conversion algorithm.
What's the recommended way to bridge the compatibility gap there, so that macOS ACL rules are honoured in Linux and vice versa?
Thanks.
I've having trouble deleting AppleDouble files residing on my custom filesystem through Finder.
This also affects files that use the AppleDouble naming convention, i.e. their names start with '._', but aren't AppleDoubles themselves.
dtrace output
In vnop_readdir, 'struct dent/dentry' is set up for dotbar files and written to the uio_t buffer.
It's just that my vnop_remove is never called for dotbar files, and I don't understand why not.
Dotbar files are removed successfully, when deleted through command line.
For SMBClients, vnop_readdir is followed by vnop_access, followed by vnop_lookup, followed by vnop_remove of dotbar files.
SMBClient rm dotbar files dtrace output
Implementing vnop_access for my filesystem did not result in the combination of vnop_lookup and vnop_remove being called for dotbar files.
Perusing the kernel sources, I observed the following functions that might be involved, but I have not way of verifying this, as none of the functions of interest are dtrace(1)-able, rmdir_remove_orphaned_appleDouble() in particular.
rmdir_remove_orphaned_appleDouble() -> VNOP_READDIR().
rmdirat_internal() -> rmdir_remove_orphaned_appleDouble()
unlinkat()-> rmdirat_internal()
rmdir()-> rmdirat_internal()
Any pointers on how dotbar files may be removed through Finder would be greatly appreciated.
Attempting to acquire the value of the 'kern.hostname' ctl from a kext by calling sysctlbyname() returns EPERM with no hostname returned.
sysctlbyname() is aliased to kernel_sysctlbyname():
config/Libkern.exports:839:_sysctlbyname:_kernel_sysctlbyname
Looking at the implementation of kernel_sysctlbyname(), EPERM is returned by sysctl_root(). Not sure how to correctly identify the point of failure.
Alternately, calling
sysctlbyname("hw.ncpu")
does return the value set for the ctl.
The kext was compiled with SYSCTL_DEF_ENABLED defined to have the relevant section of sys/sysctl.h enabled.
bsd_hostname() is a private symbol which is inaccessible to my kext.
% sysctl -n kern.hostname
does return the host name, so the ctl must be set.
Is it possible to get the name of a host from the context of my kext?
Thanks.
Accessing a directory on my custom distributed filesystem results in a kernel panic.
According to the backtrace, the last function called before the panic is triggered is mac_label_verify().
See the backtrace file attached.
mac_label_verify-panic.txt
The panic manifests itself given the following conditions:
Machine-a: make a directory in Finder.
Machine-b: remove the directory created on machine-a in Finder.
Machine-a: access the directory removed on machine-b in Finder. Kernel panic ensues.
The panic is reproducible on both Apple Silicon and x86-64.
The backtrace is for x86-64 as I wasn't able to symbolicate it on Apple Silicon.
Not sure how to tackle this one.
Any pointers would be much appreciated.
Extracting an archive into the same directory on my custom filesystem more than once fails with the following message:
Unable to finish expanding 'misc.tar.xz' into 'extractme'.
Could not move 'misc' into destination directory.
I.e. initial extraction succeeds with archive contents extracted into extractme/misc.
Subsequent extraction fails to rename extractme.sb-db71cd27-lFjN1f/misc to extractme/misc 2.
This behaviour is observed on macOS Monterey and Ventura.
It does work as expected on macOS Sonoma though.
Dtrace(1)-ing the archive being extracted over smbfs results in the following sequence of calls being made:
2 -> smbfs_vnop_lookup AUHelperService-2163 -> extractme/misc 2 nameiop:0
2 <- smbfs_vnop_lookup AUHelperService-2163 -> extractme/misc 2 -> 2 ;ENOENT
2 -> smbfs_vnop_lookup AUHelperService-2163 -> extractme.sb-db71cd27-lFjN1f/misc nameiop:0x2 ;DELETE
2 <- smbfs_vnop_lookup AUHelperService-2163 -> extractme.sb-db71cd27-lFjN1f/misc -> 0
2 -> smbfs_vnop_lookup AUHelperService-2163 -> extractme/misc 2 nameiop:0x3 ;RENAME
2 <- smbfs_vnop_lookup AUHelperService-2163 -> extractme/misc 2 -> EJUSTRETURN
1 -> smbfs_vnop_rename AUHelperService-2163 -> extractme.sb-db71cd27-lFjN1f/misc -> extractme/nil
2 <- smbfs_vnop_rename AUHelperService-2163 -> extractme.sb-db71cd27-lFjN1f/misc -> extractme/nil -> 0
2 -> smbfs_vnop_lookup AUHelperService-2163 -> TheRooT/extractme/misc 2 nameiop:0
3 <- smbfs_vnop_lookup AUHelperService-2163 -> TheRooT/extractme/misc 2 -> 0 ;Successful lookup
What I don't understand is what causes vnop_lookup to be called for misc to be removed from the temporary directory and renamed into 'misc 2' and placed in the destination directory, 'extractme' via vnop_rename?
I had a look at smbfs_vnop_lookup and rename and didn't see anything that would cause 'misc 2' to come into being.
Based on the output of the dtrace(1) script running on my custom filesystem, there are no vnop_lookup and vnop_rename calls being made to remove the 'misc' directory from the temporary directory and to rename it to 'misc 2' and place it in the destination directory at extractme.
Archive extraction proceeds no further after extracting the archive contents into the temporary directory.
What am I missing?
I'm observing all sorts of race conditions occurring in various VNOPs my custom filesystem implements.
I'm inclined to attribute this to my implementation not following the locking rules expected by the system of a 3rd party filesystem as well as it should.
I've looked at how locking is done in Apple's own implementation of Samba and NFS clients.
The Samba client uses read/write locks to protect its node from data races.
While the NFS client uses mutex locks for the same purpose.
I realised that I don't have a clear model in my head of how locking should be done properly.
Thus my question, what are the locking rules for VFS and VNOP operations?
Thanks.
When recursively removing a directory with a large number of entries that resides on my custom filesystem, rm(1) aborts with ENOENT.
% rm -Rv /Volumes/myfs/linux-kernel/linux-6.10.6
[...]
/Volumes/myfs/linux-kernel/linux-6.10.6/include/drm/bridge/aux-bridge.h
/Volumes/myfs/linux-kernel/linux-6.10.6/include/drm/bridge/dw_hdmi.h
rm: fts_read: No such file or directory
I'm observing the following sequence of calls being made.
2024-09-17 17:58:25 vnops.c:281 myfs_vnop_lookup: rm-936 -> dw_hdmi.h ;initial lookup call
2024-09-17 17:58:25 vnops.c:315 myfs_vnop_lookup: -> cache_lookup(dw_hdmi.h)
2024-09-17 17:58:25 vnops.c:317 myfs_vnop_lookup: <- cache_lookup(dw_hdmi.h) -> 0 ;cache miss
2024-09-17 17:58:25 rpc.c:431 myfsLookup: rm-936 -> dw_hdmi.h ;do remote lookup
2024-09-17 17:58:25 rpc.c:500 myfsLookup: -> myfs_lookup_rpc(dw_hdmi.h)
2024-09-17 17:58:25 rpc.c:502 myfsLookup: <- myfs_lookup_rpc(dw_hdmi.h) -> 0 ;file found and added to vfs cache
2024-09-17 17:58:25 vnops.c:281 myfs_vnop_lookup: rm-936 -> dw_hdmi.h ;subsequent lookup call
2024-09-17 17:58:25 vnops.c:315 myfs_vnop_lookup: -> cache_lookup(dw_hdmi.h)
2024-09-17 17:58:25 vnops.c:317 myfs_vnop_lookup: <- cache_lookup(dw_hdmi.h) -> -1 ;cache hit
2024-09-17 17:58:25 vnops.c:1478 myfs_vnop_remove: -> myfs_unlink(dw_hdmi.h) ;unlink sequence
2024-09-17 17:58:25 rpc.c:1992 myfs_unlink: -> myfs_unlink_rpc(dw_hdmi.h)
2024-09-17 17:58:25 rpc.c:1994 myfs_unlink: <- myfs_unlink_rpc(dw_hdmi.h) -> 0 ;remote unlink succeeded
2024-09-17 17:58:25 vnops.c:1480 myfs_vnop_remove: <- myfs_unlink(dw_hdmi.h) -> 0
2024-09-17 17:58:25 vnops.c:1487 myfs_vnop_remove: -> cache_purge(dw_hdmi.h)
2024-09-17 17:58:25 vnops.c:1489 myfs_vnop_remove: <- cache_purge(dw_hdmi.h)
2024-09-17 17:58:25 vnops.c:1499 myfs_vnop_remove: -> vnode_recycle(dw_hdmi.h)
2024-09-17 17:58:25 vnops.c:1501 myfs_vnop_remove: <- vnode_recycle(dw_hdmi.h)
2024-09-17 17:58:27 vnops.c:281 myfs_vnop_lookup: fseventsd-101 -> dw_hdmi.h ;another lookup; why?
2024-09-17 17:58:27 vnops.c:315 myfs_vnop_lookup: -> cache_lookup(dw_hdmi.h)
2024-09-17 17:58:27 vnops.c:317 myfs_vnop_lookup: <- cache_lookup(dw_hdmi.h) -> 0
2024-09-17 17:58:27 rpc.c:431 myfsLookup: fseventsd-101 -> dw_hdmi.h
2024-09-17 17:58:27 rpc.c:500 myfsLookup: -> myfs_lookup_rpc(dw_hdmi.h)
2024-09-17 17:58:27 rpc.c:502 myfsLookup: <- myfs_lookup_rpc(dw_hdmi.h) -> ENOENT(2)
2024-09-17 17:58:27 vnops.c:371 myfs_vnop_lookup: SET(NNEGNCENTRIES): dw_hdmi.h
2024-09-17 17:58:27 vnops.c:373 myfs_vnop_lookup: ENOENT(2) <- shouldAddToNegativeNameCache(dw_hdmi.h)
I checked the value of vnode's v_iocount when vnop_remove and vnop_reclaim are being called.
Each vnop_remove and followed by vnop_reclaim with v_iocount set to 1 in both calls, as expected.
What I don't understand is why after removing the file is there another lookup call being made, which returns ENOENT to rm(1), which causes it to abort.
Any pointers on what could be amiss there would be much appreciated.
When running AJA System Test for my custom filesystem, the write and read tests get stuck intermittently.
I didn't observe any error codes being returned by my vnop_read/write or sock_receive/send functions.
Dtrace(1)'ing the vnops being called by AJA System Test for smbfs revealed that amongst other things vnop_advlock is being called:
0 -> smbfs_vnop_advlock ajasystemtest -> smbfs_vnop_advlock(ajatest.dat, op: 0x2, fl->l_start: 0, fl->l_len: 0, fl->l_pid: 0, fl->l_type: 2, fl->l_whence: 0, flags: 0x40, timeout: 0)
0 <- smbfs_vnop_advlock ajasystemtest -> smbfs_vnop_advlock(ajatest.dat) -> -1934627947504
op: 0x2
#define F_SETFD 2 /* set file descriptor flags */
fl->l_len: 0 ;len = 0 means until end of file
fl->l_type: 2 ;#define F_UNLCK 2 /* unlock */
fl->l_whence: 0 ;#define SEEK_SET 0 /* set file offset to offset */
flags: 0x40 ;#define F_POSIX 0x040 /* Use POSIX semantics for lock */
As my filesystem didn't implement vnop_advlock, I thought I'd explore that avenue.
My vnop_advlock simply returns KERN_SUCCESS.
Both f_capabilities.valid and f_capabilities.capabilities of struct vfs_attr have VOL_CAP_INT_ADVLOCK and VOL_CAP_INT_FLOCK set.
Yet, vnop_advlock doesn't get called for my filesystem when running AJA System Test.
Any tips on what could be amiss there would be much appreciated.
I've been using sock_upcall to do socket polling in my VFS kext.
Perusing the Darwin/XNU sources, I've stumbled upon select.
Which is the recommended way to do socket polling in the kernel, sock_upcall or select?
Thanks.
On implementing vnop_mmap, vnop_strategy and other related VNOPs as suggested in https://developer.apple.com/forums/thread/756358 my vnop_strategy routine ends up zero-extending files.
I don't understand why my filesystem behaves as described above.
Perusing the source code of both the relevant parts of Darwin/XNU and SMBClient did not clarify things for me.
A nudge in the right direction would be greatly appreciated.
The technical details of the issue are given in the plain text file attached, as some text was found to be sensitive. Unsure what exactly it was.
apple-dts-issue-desc.txt