System-wide deadlock in removexattr from revisiond / APFS

System-wide deadlock in removexattr from revisiond / APFS

We're experiencing a deadlock on certains systems when our software is installed, which is causing side effects in our process (and likely others) such as blocked queues and increased memory usage.

According to the spindump, revisiond appears to be holding an exclusive lock within the kernel.

Process:          revisiond [426]
UUID:             5E9B9E04-984B-31AD-A4FF-A1A90B7D53A1
Path:             /System/Library/PrivateFrameworks/GenerationalStorage.framework/Versions/A/Support/revisiond
Codesigning ID:   com.apple.revisiond
Shared Cache:     25AE5A2A-FE2A-3998-8D4E-F3C5C6E6CEB6 slid base address 0x189834000, slide 0x9834000 (System Primary)
Architecture:     arm64e
Parent:           launchd [1]
UID:              0
Sudden Term:      Tracked
Memory Limit:     50MB
Jetsam Priority:  40
Footprint:        6225 KB
Time Since Fork:  1740319s
Num samples:      940 (1-940)
Num threads:      5
Note:             1 idle work queue thread omitted

[...]

  Thread 0xc0616d    940 samples (1-940)    priority 46 (base 4)    last ran 241692.754s ago
  940  start_wqthread + 8 (libsystem_pthread.dylib + 7068) [0x189d0ab9c]
    940  _pthread_wqthread + 292 (libsystem_pthread.dylib + 11852) [0x189d0be4c]
      940  _dispatch_workloop_worker_thread + 692 (libdispatch.dylib + 85356) [0x189b65d6c]
        940  _dispatch_root_queue_drain_deferred_wlh + 292 (libdispatch.dylib + 87156) [0x189b66474]
          940  _dispatch_lane_invoke + 440 (libdispatch.dylib + 45048) [0x189b5bff8]
            940  _dispatch_lane_serial_drain + 944 (libdispatch.dylib + 42420) [0x189b5b5b4]
              940  _dispatch_client_callout + 16 (libdispatch.dylib + 113364) [0x189b6cad4]
                940  _dispatch_call_block_and_release + 32 (libdispatch.dylib + 7004) [0x189b52b5c]
                  940  ??? (revisiond + 168768) [0x10494d340]
                    940  ??? (revisiond + 165940) [0x10494c834]
                      940  ??? (revisiond + 40264) [0x10492dd48]
                        940  ??? (revisiond + 56680) [0x104931d68]
                          940  <patched truncated backtrace>
                            940  removexattr + 8 (libsystem_kernel.dylib + 23768) [0x189cd1cd8]
                             *940  ??? (kernel.release.t6000 + 15240) [0xfffffe000886fb88]
                               *940  ??? (kernel.release.t6000 + 1886348) [0xfffffe0008a3888c]
                                 *940  ??? (kernel.release.t6000 + 7730436) [0xfffffe0008fcb504]
                                   *940  ??? (kernel.release.t6000 + 2759592) [0xfffffe0008b0dba8]
                                     *940  ??? (kernel.release.t6000 + 2808244) [0xfffffe0008b199b4]
                                       *940  apfs_vnop_removexattr + 1044 (apfs + 474512) [0xfffffe000be8d4d0]
                                         *940  decmpfs_cnode_set_vnode_state + 80 (kernel.release.t6000 + 2945816) [0xfffffe0008b3b318]
                                           *940  IORWLockWrite + 184 (kernel.release.t6000 + 496184) [0xfffffe00088e5238]
                                             *940  ??? (kernel.release.t6000 + 494624) [0xfffffe00088e4c20]
                                               *940  ??? (kernel.release.t6000 + 619452) [0xfffffe00089033bc]
                                                 *940  ??? (kernel.release.t6000 + 624472) [0xfffffe0008904758]

The bulk of the other processes are waiting for that lock.

(suspended, blocked by krwlock for reading owned by revisiond [426] thread 0xc0616d)

(blocked by krwlock for writing owned by revisiond [426] thread 0xc0616d)

Around the time of the event, these messages were logged by revision:

2026-03-06 18:49:37.781673-0500 0x16b7     Error       0x7f92f364           426    14   revisiond: [com.apple.revisiond:default] [ERROR] CSCopyChunkIDsForToken failed for 41639
2026-03-06 18:49:37.781716-0500 0x16b7     Error       0x7f92f365           426    14   revisiond: [com.apple.revisiond:default] [ERROR] updateEntry for new entry <private> failed
2026-03-06 18:49:37.781738-0500 0x16b7     Error       0x7f92f366           426    14   revisiond: [com.apple.revisiond:default] [ERROR] no entry for '<private>'
2026-03-06 18:49:37.781754-0500 0x16b7     Error       0x7f92f367           426    14   revisiond: [com.apple.revisiond:default] [ERROR] failed assembleInfoForOffset for fsid 16777234 fileid 359684022 offset 0 size 14334 (path <private>)

Our agent uses Endpoint Security Framework to monitor events, and provide anti-tamper functionality for installed components and processes. While several EndpointSecurity calls appear in the spindump stack traces, we don't have any evidence that any calls from revisiond were blocked.

What we'd really like to to understand what that lock is (appears to be decompressing an object on an APFS volume), what revisiond and APFS are doing with it, and what might cause it to deadlock.

Of note, one of our processes is also waiting on that lock, one thread for reading and the other for writing.

This issue affects machines running several macOS versions (15.x, 26.x). The machine in the examples is running macOS 26.3 (25D125)

We're experiencing a deadlock on certains systems when our software is installed, which is causing side effects in our process (and likely others) such as blocked queues and increased memory usage.

According to the spindump, revisiond appears to be holding an exclusive lock within the kernel.

Can you file a bug on this and upload the full spindump there? That's the easiest way to transfer large data files and I may eventually want a sysdiagnose.

Having said that...

Our agent uses Endpoint Security Framework to monitor events, and provide anti-tamper functionality for installed components and processes. While several EndpointSecurity calls appear in the spindump stack traces, we don't have any evidence that any calls from revisiond were blocked.

...this is almost CERTAINLY caused by your ES client.

What we'd really like to to understand what that lock is (appears to be decompressing an object on an APFS volume), what revisiond and APFS are doing with it, and what might cause it to deadlock.

You can actually find the code for decmpfs_cnode_set_vnode_state here, however, I think you're asking the wrong question. The question here isn't what revisiond is doing, it's who revisiond is waiting on. Some other thread locked that lock and is now stuck, which is the real issue you need to sort out.

Case in point, you're getting closer:

Of note, one of our processes is also waiting on that lock, one thread for reading and the other for writing.

However, the problem thread ISN'T going to be waiting on that lock- that thread is going to be stuck "somewhere else", unable to complete it's work and release the lock.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Can you file a bug on this and upload the full spindump there?

I opened one last week. Case-ID: 18860388

...this is almost CERTAINLY caused by your ES client.

Yes.

The question here isn't what revisiond is doing, it's who revisiond is waiting on. Some other thread locked that lock and is now stuck, which is the real issue you need to sort out.

It definitely revisiond (well, APFS), that's holding the lock. There are dozens of threads where the shared lock points at revisiond [426] thread 0xc0616d), both for reading and writing.

 *940  apfs_vnop_getattr + 312 (apfs + 604336) [0xfffffe000beacff0]
   *940  IORWLockRead + 144 (kernel.release.t6000 + 496568) [0xfffffe00088e53b8]
	 *940  ??? (kernel.release.t6000 + 497548) [0xfffffe00088e578c]
	   *940  ??? (kernel.release.t6000 + 619452) [0xfffffe00089033bc]
		 *940  ??? (kernel.release.t6000 + 624472) [0xfffffe0008904758] (suspended, blocked by krwlock for reading owned by revisiond [426] thread 0xc0616d)

 *940  apfs_vnop_read + 708 (apfs + 555972) [0xfffffe000bea1304]
   *940  IORWLockWrite + 184 (kernel.release.t6000 + 496184) [0xfffffe00088e5238]
	 *940  ??? (kernel.release.t6000 + 494304) [0xfffffe00088e4ae0]
	   *940  ??? (kernel.release.t6000 + 619452) [0xfffffe00089033bc]
		 *940  ??? (kernel.release.t6000 + 624472) [0xfffffe0008904758] (blocked by krwlock for writing owned by revisiond [426] thread 0xc0616d)

*940  icp_lock_inode + 72 (apfs + 757908) [0xfffffe000bed27d4]
 *940  IORWLockWrite + 184 (kernel.release.t6000 + 496184) [0xfffffe00088e5238]
   *940  ??? (kernel.release.t6000 + 494304) [0xfffffe00088e4ae0]
	 *940  ??? (kernel.release.t6000 + 619452) [0xfffffe00089033bc]
	   *940  ??? (kernel.release.t6000 + 624472) [0xfffffe0008904758] (blocked by krwlock for writing owned by revisiond [426] thread 0xc0616d)

Only that revisiond thread that doesn't blame another thread / lock.

The IORWLockWrite stack seems to point machine_switch_context, i.e. when the lock is owned by another thread, and so the current thread is suspended / yielded to another one, waiting the lock to be reclaimable again.

But then it's a bit incoherent with all the other threads pointing that "blocked by krwlock for writing owned by revisiond [426] thread 0xc0616d" (it can't be at the same time the owner, and not the owner…).

Is it possible that machine_switch_context is called if you were able to get the ownership of the lock ? In which kind of scenario ? The stack doesn't seem to tell it. And we don't have the source code of IORWLockWrite.

It's like something suspended the revisiond thread in the kernel when it executed IORWLockWrite, but then this "something" is unable to resume it because it is blocked itself (on this same lock ?). But then it doesn't align with this machine_switch_context symbol in the stack.

I opened one last week. Case-ID: 18860388

I don't think that’s a valid bug number. Details on the bug filing process are here, and the numbers are prefixed "FB". Again, please upload the full spintrace to that bug and then post the bug number back here.

It definitely revised (well, APFS), that's holding the lock.

Sure, but the question is "why", not "who". Causes this kind of hang are the interactions between multiple locks and multiple processes. It's hard to pick up unless you're looking at the full log and know what you're looking for, but the basic form is that there are two locks:

  1. The "outer" lock, which the blocking thread (in this case, thread 0xc0616d) is inside and holding.

  2. The "inner" lock, which the blocking thread (thread 0xc0616d) is stuck waiting on.

You can actually see this dynamic in the traces you sent. This lock is an APFS-owned lock:

 *940  apfs_vnop_getattr + 312 (apfs + 604336) [0xfffffe000beacff0]
   *940  IORWLockRead + 144 (kernel.release.t6000 + 496568) [0xfffffe00088e53b8]

And, unsurprisingly, deleting an xattr requires holding that lock:

*940  apfs_vnop_removexattr + 1044 (apfs + 474512) [0xfffffe000be8d4d0]

However, this is NOT an APFS lock, but is actually a lock in the kernel vfs system:

*940  decmpfs_cnode_set_vnode_state + 80 (kernel.release.t6000 + 2945816) [0xfffffe0008b3b318]
  *940  IORWLockWrite + 184 (kernel.release.t6000 + 496184) [0xfffffe00088e5238]

Again, you can see that second lock here if you're curious.

There are dozens of threads where the shared lock points at revisiond [426] thread 0xc0616d), both for reading and writing.

All of the APFS locks tend to be held for very short periods of time, so it's not unusual for work to pile up very quickly. More to the point, all of those other threads are (mostly) irrelevant to the issue. I'd actually be looking for any other reference to compression/decompression or xattrs.

Only that revisiond thread that doesn't blame another thread / lock.

The blame logic is not infallible, particularly when pure kernel threads are involved.

Is it possible that machine_switch_context is called if you were able to get the ownership of the lock? In which kind of scenario? The stack doesn't seem to tell it.

No, not really.

And we don't have the source code of IORWLockWrite

Yes, you do. It's defined in IOLocks.h, which maps it to lck_rw_lock_exclusive. However, I wouldn't expect that to lead you anywhere useful.

The bigger point here is that I think you already know revisiond isn't actually blocking itself. In your first post, you said "deadlock", not "panic":

We're experiencing a deadlock on certain systems when our software is installed, which is causing side effects in our process (and likely others) such as blocked queues and increased memory usage.

...which means I suspect you meant "hang", meaning that the problem eventually cleared itself. If the problem clears on its own, that meant someone had to release revisiond to make forward progress.

That leads to here:

It's like something suspended the revisiond thread in the kernel when it executed IORWLockWrite, but then this "something" is unable to resume it because it is blocked itself (on this same lock?).

See above for the description of how this happens.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

All of the APFS locks tend to be held for very short periods of time, so it's not unusual for work to pile up very quickly. More to the point, all of those other threads are (mostly) irrelevant to the issue. I'd actually be looking for any other reference to compression/decompression or xattrs.

If they are held for a very short amount of time, shouldn't we rarely see other threads waiting for it ? It's what I would expect, at least.

And here we can see that all other threads are waiting for it for the whole spindump duration (Num samples: 940 (1-940) / IORWLockWrite & IORWLockRead → 940). I mean, I know this count the number of times the sampler see these symbols each time it samples the processes (i.e. it doesn't mean this code was running between each sample), but I would be surprised that these exact same stacks is re-happening exactly at the same time as the samples are done by chances: they are likely running for the whole time.


Yes, you do. It's defined in IOLocks.h, which maps it to lck_rw_lock_exclusive. However, I wouldn't expect that to lead you anywhere useful.

Yep, I noticed that, but as we see IORWLock... in the stacks, I concluded that IOLOCKS_INLINE wasn't set, and that it really uses IORWLock... function (#define is a preprocessor macro, no reason for this function to appear in the stack if IOLOCKS_INLINE is set)


No, not really.

But then I don't understand why all the threads are pointing that revisiond thread is owning the lock, while revisiond thread stack seems to says that it wasn't able to own it (and so is suspended)…

Or, as you said, the blame logic is just all wrong, and everyone is pointing this revisiond thread by mistake, and revisiond is just blocked on someone else, like everyone else.


I let the OP answer the other points.

I don't think that’s a valid bug number.

Sorry, it was a TSI, not a bug report. I replied to the DTS email yesterday with the full spin dump.

Let me know if you still want a bug report as well.

Again, you can see that second lock here if you're curious.

There are only 2 other threads in the logs that appear relevant, both from our helper process and both down in APFS. One of them is also stuck inside decmpfs_read_compressed:

  Thread 0xc06805    DispatchQueue "com.apple.root.default-qos"(13)    940 samples (1-940)    priority 46 (base 31)    last ran 241692.753s ago
  940  start_wqthread + 8 (libsystem_pthread.dylib + 7068) [0x189d0ab9c]
    940  _pthread_wqthread + 232 (libsystem_pthread.dylib + 11792) [0x189d0be10]
      940  _dispatch_worker_thread2 + 180 (libdispatch.dylib + 83844) [0x189b65784]
        940  _dispatch_root_queue_drain + 736 (libdispatch.dylib + 82236) [0x189b6513c]
          940  <deduplicated_symbol> + 32 (libdispatch.dylib + 231900) [0x189b899dc]
            940  _dispatch_client_callout + 16 (libdispatch.dylib + 113364) [0x189b6cad4]
              940  _dispatch_call_block_and_release + 32 (libdispatch.dylib + 7004) [0x189b52b5c]
                940  __34-[MyConcurrentQueue performBlock:]_block_invoke (in MyFramework) (MyConcurrentQueue.m:228) + 355392  [0x102a42c40]
                  940  __66-[MyPathHelper fetchFileTypeForFileAtPath:statData:queue:handler:]_block_invoke.125 (in MyFramework) (MyPathHelper.m:1056) + 43416  [0x1029f6998]
                    940  pread + 8 (libsystem_kernel.dylib + 9788) [0x189cce63c]
                     *940  ??? (kernel.release.t6000 + 15240) [0xfffffe000886fb88]
                       *940  ??? (kernel.release.t6000 + 1886348) [0xfffffe0008a3888c]
                         *940  ??? (kernel.release.t6000 + 7730436) [0xfffffe0008fcb504]
                           *940  ??? (kernel.release.t6000 + 6448868) [0xfffffe0008e926e4]
                             *940  ??? (kernel.release.t6000 + 6447764) [0xfffffe0008e92294]
                               *940  ??? (kernel.release.t6000 + 2790292) [0xfffffe0008b15394]
                                 *940  ??? (kernel.release.t6000 + 2791252) [0xfffffe0008b15754]
                                   *940  apfs_vnop_read + 508 (apfs + 555772) [0xfffffe000bea123c]
                                     *940  decmpfs_read_compressed + 300 (kernel.release.t6000 + 2955192) [0xfffffe0008b3d7b8]
                                       *940  ??? (kernel.release.t6000 + 2946852) [0xfffffe0008b3b724]
                                         *940  ??? (kernel.release.t6000 + 2800480) [0xfffffe0008b17b60]
                                           *940  apfs_inode_getxattr + 716 (apfs + 1541792) [0xfffffe000bf91de0]
                                             *940  IORWLockRead + 144 (kernel.release.t6000 + 496568) [0xfffffe00088e53b8]
                                               *940  ??? (kernel.release.t6000 + 497548) [0xfffffe00088e578c]
                                                 *940  ??? (kernel.release.t6000 + 619452) [0xfffffe00089033bc]
                                                   *940  ??? (kernel.release.t6000 + 624472) [0xfffffe0008904758] (blocked by krwlock for reading owned by revisiond [426] thread 0xc0616d)

  Thread 0xc06362    940 samples (1-940)    priority 46 (base 31)    last ran 241692.753s ago
  940  start_wqthread + 8 (libsystem_pthread.dylib + 7068) [0x189d0ab9c]
    940  _pthread_wqthread + 232 (libsystem_pthread.dylib + 11792) [0x189d0be10]
      940  _dispatch_worker_thread2 + 180 (libdispatch.dylib + 83844) [0x189b65784]
        940  _dispatch_root_queue_drain + 736 (libdispatch.dylib + 82236) [0x189b6513c]
          940  <deduplicated_symbol> + 32 (libdispatch.dylib + 231900) [0x189b899dc]
            940  _dispatch_client_callout + 16 (libdispatch.dylib + 113364) [0x189b6cad4]
              940  _dispatch_call_block_and_release + 32 (libdispatch.dylib + 7004) [0x189b52b5c]
                940  __34-[MyConcurrentQueue performBlock:]_block_invoke (in MyFramework) (MyConcurrentQueue.m:228) + 355392  [0x102a42c40]
                  940  __66-[MyPathHelper fetchFileTypeForFileAtPath:statData:queue:handler:]_block_invoke.125 (in MyFramework) (MyPathHelper.m:1056) + 43416  [0x1029f6998]
                    940  <patched truncated backtrace>
                      940  pread + 8 (libsystem_kernel.dylib + 9788) [0x189cce63c]
                       *940  ??? (kernel.release.t6000 + 15240) [0xfffffe000886fb88]
                         *940  ??? (kernel.release.t6000 + 1886348) [0xfffffe0008a3888c]
                           *940  ??? (kernel.release.t6000 + 7730436) [0xfffffe0008fcb504]
                             *940  ??? (kernel.release.t6000 + 6448868) [0xfffffe0008e926e4]
                               *940  ??? (kernel.release.t6000 + 6447764) [0xfffffe0008e92294]
                                 *940  ??? (kernel.release.t6000 + 2790292) [0xfffffe0008b15394]
                                   *940  ??? (kernel.release.t6000 + 2791252) [0xfffffe0008b15754]
                                     *940  apfs_vnop_read + 708 (apfs + 555972) [0xfffffe000bea1304]
                                       *940  IORWLockWrite + 184 (kernel.release.t6000 + 496184) [0xfffffe00088e5238]
                                         *940  ??? (kernel.release.t6000 + 494304) [0xfffffe00088e4ae0]
                                           *940  ??? (kernel.release.t6000 + 619452) [0xfffffe00089033bc]
                                             *940  ??? (kernel.release.t6000 + 624472) [0xfffffe0008904758] (blocked by krwlock for writing owned by revisiond [426] thread 0xc0616d)

I will review these calls and locks some more, but have limited insight into the APFS implementation here.

...which means I suspect you meant "hang", meaning that the problem eventually cleared itself. If the problem clears on its own, that meant someone had to release revisiond to make forward progress.

No, it was 3 days later and the problem hadn't resolved. At that time, we tried killing revisiond but it was left a zombie (due to the held lock?). A reboot was required to resolve the problem.

System-wide deadlock in removexattr from revisiond / APFS
 
 
Q