Post

Replies

Boosts

Views

Activity

Reply to VFS deadlock (WebDAV on Apple Silicon only)
Thanks to anyone that has read my post and have been debugging the deadlock! Just a few days ago the source packages for Ventura 13.3 were published, and in that I found webdavfs-395. The following diff is one that I'm very happy to see: https://github.com/apple-oss-distributions/webdavfs/commit/b7756b02549929bb18062ebcd76f0bbb75a120cb That change does in fact target exactly the use-case of calling ubc_msync() on mmap'ed files at clean-up, but unfortunately real-world testing of a fresh Ventura 13.5b4 still triggers the deadlock despite including webdavfs-395. In other words, it seems there is still a code path that goes through webdav_vnop_pageout, determine that is_open == TRUE and end up calling webdav_vnop_close -> ... -> webdav_fsync -> ubc_msync meaning the newly introduced flag WEBDAV_PAGEOUT_CLOSE_IN_RECLAIM wasn't set in webdav_vnop_pageout. So is vnode_isrecycled not really the check that's relevant, or is the page-out function called multiple times (either recursively or in sequence, e.g. due to webdav_unmount flushing twice), or maybe something else? My test case remains the same as above, and for completeness I'm including a fresh thread dump from the deadlocked unmount thread on Ventura 13.5b4: Date/Time: 2023-07-03 10:35:06.796 +0200 End time: 2023-07-03 10:35:16.801 +0200 OS Version: macOS 13.5 (Build 22G5059d) Architecture: arm64e Report Version: 40 Data Source: Stackshots Shared Cache: 725CB32F-D723-38F2-8952-4D21C1FD290B slid base address 0x198a8c000, slide 0x18a8c000 (System Primary) Shared Cache: D6EB184C-4628-3C49-9D21-5D5A97D08FDC slid base address 0x1bd9e8000, slide 0x3d9e8000 (DriverKit) Shared Cache: 4E3FAD7E-E5B0-35FD-BF81-F0E22E907F07 slid base address 0x7ff8084cc000, slide 0x84cc000 (Rosetta) Duration: 10.00s Steps: 1001 (10ms sampling interval) Hardware model: Macmini9,1 Active cpus: 8 HW page size: 16384 VM page size: 16384 [...] Process: diskarbitrationd [537] UUID: 9DF766CA-6596-3311-8A02-20FC83FD3A24 Path: /usr/libexec/diskarbitrationd Codesigning ID: com.apple.diskarbitrationd Shared Cache: 725CB32F-D723-38F2-8952-4D21C1FD290B slid base address 0x198a8c000, slide 0x18a8c000 (System Primary) Architecture: arm64e Parent: launchd [1] UID: 0 Sudden Term: Tracked Footprint: 2993 KB Time Since Fork: 1033s Num samples: 1001 (1-1001) Note: 1 idle work queue thread omitted Thread 0xf69 1001 samples (1-1001) priority 31 (base 31) 1001 _dispatch_sig_thread + 60 (libdispatch.dylib + 97144) [0x198cefb78] 1001 __sigsuspend_nocancel + 8 (libsystem_kernel.dylib + 34280) [0x198e535e8] *1001 ??? (kernel.release.t8103 + 5219752) [0xfffffe00088aa5a8] Thread 0x55ad 1001 samples (1-1001) priority 46 (base 31) 1001 thread_start + 8 (libsystem_pthread.dylib + 7584) [0x198e86da0] 1001 _pthread_start + 148 (libsystem_pthread.dylib + 28584) [0x198e8bfa8] 1001 ??? (diskarbitrationd + 99436) [0x102a2c46c] 1001 unmount + 8 (libsystem_kernel.dylib + 54736) [0x198e585d0] *1001 ??? (kernel.release.t8103 + 30712) [0xfffffe00083b77f8] *1001 ??? (kernel.release.t8103 + 1599252) [0xfffffe0008536714] *1001 ??? (kernel.release.t8103 + 6321364) [0xfffffe00089b74d4] *1001 ??? (kernel.release.t8103 + 2282292) [0xfffffe00085dd334] *1001 ??? (kernel.release.t8103 + 2283096) [0xfffffe00085dd658] *1001 vnode_iterate + 528 (kernel.release.t8103 + 2192876) [0xfffffe00085c75ec] *1001 ??? (kernel.release.t8103 + 5437556) [0xfffffe00088df874] *1001 ??? (kernel.release.t8103 + 987212) [0xfffffe00084a104c] *1001 ??? (kernel.release.t8103 + 990312) [0xfffffe00084a1c68] *1001 ??? (kernel.release.t8103 + 930160) [0xfffffe0008493170] *1001 ??? (kernel.release.t8103 + 930448) [0xfffffe0008493290] *1001 ??? (kernel.release.t8103 + 5806728) [0xfffffe0008939a88] *1001 webdav_vnop_pageout + 452 (com.apple.filesystems.webdav + 17112) [0xfffffe000b37d708] *1001 webdav_vnop_close + 64 (com.apple.filesystems.webdav + 9628) [0xfffffe000b37b9cc] *1001 webdav_vnop_close_locked + 96 (com.apple.filesystems.webdav + 19916) [0xfffffe000b37e1fc] *1001 webdav_close_mnomap + 264 (com.apple.filesystems.webdav + 20212) [0xfffffe000b37e324] *1001 webdav_fsync + 416 (com.apple.filesystems.webdav + 20704) [0xfffffe000b37e510] *1001 ubc_msync + 184 (kernel.release.t8103 + 5438968) [0xfffffe00088dfdf8] *1001 ??? (kernel.release.t8103 + 987212) [0xfffffe00084a104c] *1001 ??? (kernel.release.t8103 + 989672) [0xfffffe00084a19e8] *1001 lck_rw_sleep + 132 (kernel.release.t8103 + 461616) [0xfffffe0008420b30] *1001 ??? (kernel.release.t8103 + 555992) [0xfffffe0008437bd8] *1001 ??? (kernel.release.t8103 + 562548) [0xfffffe0008439574] I'm incredibly thankful that there's someone actively working on this problem, and please let me know if I can help in any way.
Topic: App & System Services SubTopic: Core OS Tags:
Jul ’23
Reply to Fail to submit content to this forums
Thank you for sharing this info. I just experienced the same thing but wouldn't have figured out a solution without this thread. I've submitted FB12508762 so this can be improved.
Replies
Boosts
Views
Activity
Jul ’23
Reply to VFS deadlock (WebDAV on Apple Silicon only)
Thanks to anyone that has read my post and have been debugging the deadlock! Just a few days ago the source packages for Ventura 13.3 were published, and in that I found webdavfs-395. The following diff is one that I'm very happy to see: https://github.com/apple-oss-distributions/webdavfs/commit/b7756b02549929bb18062ebcd76f0bbb75a120cb That change does in fact target exactly the use-case of calling ubc_msync() on mmap'ed files at clean-up, but unfortunately real-world testing of a fresh Ventura 13.5b4 still triggers the deadlock despite including webdavfs-395. In other words, it seems there is still a code path that goes through webdav_vnop_pageout, determine that is_open == TRUE and end up calling webdav_vnop_close -> ... -> webdav_fsync -> ubc_msync meaning the newly introduced flag WEBDAV_PAGEOUT_CLOSE_IN_RECLAIM wasn't set in webdav_vnop_pageout. So is vnode_isrecycled not really the check that's relevant, or is the page-out function called multiple times (either recursively or in sequence, e.g. due to webdav_unmount flushing twice), or maybe something else? My test case remains the same as above, and for completeness I'm including a fresh thread dump from the deadlocked unmount thread on Ventura 13.5b4: Date/Time: 2023-07-03 10:35:06.796 +0200 End time: 2023-07-03 10:35:16.801 +0200 OS Version: macOS 13.5 (Build 22G5059d) Architecture: arm64e Report Version: 40 Data Source: Stackshots Shared Cache: 725CB32F-D723-38F2-8952-4D21C1FD290B slid base address 0x198a8c000, slide 0x18a8c000 (System Primary) Shared Cache: D6EB184C-4628-3C49-9D21-5D5A97D08FDC slid base address 0x1bd9e8000, slide 0x3d9e8000 (DriverKit) Shared Cache: 4E3FAD7E-E5B0-35FD-BF81-F0E22E907F07 slid base address 0x7ff8084cc000, slide 0x84cc000 (Rosetta) Duration: 10.00s Steps: 1001 (10ms sampling interval) Hardware model: Macmini9,1 Active cpus: 8 HW page size: 16384 VM page size: 16384 [...] Process: diskarbitrationd [537] UUID: 9DF766CA-6596-3311-8A02-20FC83FD3A24 Path: /usr/libexec/diskarbitrationd Codesigning ID: com.apple.diskarbitrationd Shared Cache: 725CB32F-D723-38F2-8952-4D21C1FD290B slid base address 0x198a8c000, slide 0x18a8c000 (System Primary) Architecture: arm64e Parent: launchd [1] UID: 0 Sudden Term: Tracked Footprint: 2993 KB Time Since Fork: 1033s Num samples: 1001 (1-1001) Note: 1 idle work queue thread omitted Thread 0xf69 1001 samples (1-1001) priority 31 (base 31) 1001 _dispatch_sig_thread + 60 (libdispatch.dylib + 97144) [0x198cefb78] 1001 __sigsuspend_nocancel + 8 (libsystem_kernel.dylib + 34280) [0x198e535e8] *1001 ??? (kernel.release.t8103 + 5219752) [0xfffffe00088aa5a8] Thread 0x55ad 1001 samples (1-1001) priority 46 (base 31) 1001 thread_start + 8 (libsystem_pthread.dylib + 7584) [0x198e86da0] 1001 _pthread_start + 148 (libsystem_pthread.dylib + 28584) [0x198e8bfa8] 1001 ??? (diskarbitrationd + 99436) [0x102a2c46c] 1001 unmount + 8 (libsystem_kernel.dylib + 54736) [0x198e585d0] *1001 ??? (kernel.release.t8103 + 30712) [0xfffffe00083b77f8] *1001 ??? (kernel.release.t8103 + 1599252) [0xfffffe0008536714] *1001 ??? (kernel.release.t8103 + 6321364) [0xfffffe00089b74d4] *1001 ??? (kernel.release.t8103 + 2282292) [0xfffffe00085dd334] *1001 ??? (kernel.release.t8103 + 2283096) [0xfffffe00085dd658] *1001 vnode_iterate + 528 (kernel.release.t8103 + 2192876) [0xfffffe00085c75ec] *1001 ??? (kernel.release.t8103 + 5437556) [0xfffffe00088df874] *1001 ??? (kernel.release.t8103 + 987212) [0xfffffe00084a104c] *1001 ??? (kernel.release.t8103 + 990312) [0xfffffe00084a1c68] *1001 ??? (kernel.release.t8103 + 930160) [0xfffffe0008493170] *1001 ??? (kernel.release.t8103 + 930448) [0xfffffe0008493290] *1001 ??? (kernel.release.t8103 + 5806728) [0xfffffe0008939a88] *1001 webdav_vnop_pageout + 452 (com.apple.filesystems.webdav + 17112) [0xfffffe000b37d708] *1001 webdav_vnop_close + 64 (com.apple.filesystems.webdav + 9628) [0xfffffe000b37b9cc] *1001 webdav_vnop_close_locked + 96 (com.apple.filesystems.webdav + 19916) [0xfffffe000b37e1fc] *1001 webdav_close_mnomap + 264 (com.apple.filesystems.webdav + 20212) [0xfffffe000b37e324] *1001 webdav_fsync + 416 (com.apple.filesystems.webdav + 20704) [0xfffffe000b37e510] *1001 ubc_msync + 184 (kernel.release.t8103 + 5438968) [0xfffffe00088dfdf8] *1001 ??? (kernel.release.t8103 + 987212) [0xfffffe00084a104c] *1001 ??? (kernel.release.t8103 + 989672) [0xfffffe00084a19e8] *1001 lck_rw_sleep + 132 (kernel.release.t8103 + 461616) [0xfffffe0008420b30] *1001 ??? (kernel.release.t8103 + 555992) [0xfffffe0008437bd8] *1001 ??? (kernel.release.t8103 + 562548) [0xfffffe0008439574] I'm incredibly thankful that there's someone actively working on this problem, and please let me know if I can help in any way.
Topic: App & System Services SubTopic: Core OS Tags:
Replies
Boosts
Views
Activity
Jul ’23