Part 2....
At this point, APFS seems to start actually deleting the data because it detects that the container’s free space has reached a certain threshold. This might also explain why 100GB could be automatically reclaimed while 200GB couldn’t. I wonder if my guess is correct.
Sort of. What's actually going on here is that the container can also "ask" the volume to "give back" storage it isn't using. That process isn't an entirely passive process, as returning data to the container also means that the volumes own data structures need to be modified.
As an aside here, when you run "hdiutil compact":
-
Is the volume mounted?
-
Is the apfs volume encrypted?
Both of those issue could constrain hdiutil is actually doing.
If it is correct, then the key to reclaiming space is to make APFS truly delete the data it has marked as unused.
Again, I don't like the term "delete" in these conversation, as it implies that the actual data on disk is being overwritten, when that's not actually happening.
That leads to here:
The disk does shrink, but it always retains around 20GB of occupied space. For example, I created a 1TB sparsebundle, filled it with 200GB of data, then deleted 100GB of it—the actual occupied space shrank to around 120GB. After deleting another 50GB, the disk shrank further to about 70GB. Finally, when I deleted everything, it shrank to around 20GB.
One thing to understand here is that in the context of physical media the behavior above is basically "right". A 1 TB drive with 200GB of data on it is 4/5ths empty and returning storage to the container that the container doesn't need doesn't provide any real benefit. The volume can return it if/when the container "needs" it back. This is also why there isn't any general "shrink APFS" command/tool. The ONLY contexts doing this makes any sense are when resizing containers/partitions and when manipulating disk images.
Then, I used the resize command to adjust the disk size to 100GB, and it almost fully reclaimed the space, leaving only about 200MB occupied.
...which is exactly what happens when a resize occurs.
The space reduction always happens some time after remounting. Moreover, the shrinking only occurs when the sparsebundle is opened by double-clicking in Finder. If I mount it using the hdidutil attach command, no shrinking happens.
Conceptually, "attach" and "mount" are best understood as entirely separate operations, though the hdiutil command muddles them together a bit. What they actually (should) mean are:
-
"attach"-> Build out the physical device tree in IOKit for a particular device. The final result of this is the /dev nodes the provide block level access to the media.
-
"mount"-> Read those /dev nodes to find a corresponding file system driver, then use that driver to create a new directory hierarchy within the existing logical hierarchy.
This division is actually how ALL block storage devices are handled (not just disk images). What actually happens when you plug in a new device is:
-
IOKit detects that a device has been attached and a long chain of drivers are loaded. The "top" of that driver chain reads the partition map of the device and creates a series of IOMediaBSDClients, each of which create a corresponding /dev/disk
-
diskarbitrationd monitors IOKit for new clients and starts the process of mounting them.
Note that step #2 here is basically just the "default" behavior. It's possible to stop diskarbitrationd from one and if you do so what will happen is... volume will stop showing up one the desktop when you plug devices in. Similarly, DiskArbitration is the public API for this and one of it's primary functions is allowing apps to block device from mounting.
With that background context:
Does Finder perform some additional operations?
No. As in many other cases, people tend to think of the Finder as "doing stuff" because it's the front end user interface that you actually "see". However, the reality is that it's role is generally much more passive than that, basically acting as a "viewer" for what other parts of the system are "doing". In the case of volumes, what it's actually doing is monitoring for volume mounts (using DiskArb) and then displaying them based on what it was "told" to do.
So, what’s the difference between opening a sparsebundle in Finder and mounting it with the hdidutil attach command?
I'm not sure sure exactly what attach command you ran, but the answer is that the standard "attach" operation above doesn't actually interpret the volume or perform I/O. The partition map of the device is read by IOKit and part of the APFS implementation (the APFS container architecture is largely implemented in IOKit and happens at "attach"), but there is minimal if ANY writing occurring until the volume actually mounts.
Related to that, on this point:
The space reduction always happens some time after remounting
This is a complicated side effect of the evolution of file system design. The model in most peoples head is that you "do something" and "the file system changes", but file system's have been moving away from that for a long time. Simplifying greatly, the more "modern" architecture is the the file system records what it's "going" to do and then "completes" that work over time. That also mean that it can (and does) defer "maintenance work", with "immediately after mount" being an obvious point where it avoids doing anything it doesn't "have" to do.
__
Kevin Elliott
DTS Engineer, CoreOS/Hardware