Are read-only filesystems currently supported by FSKit?

Question

Created Nov ’25

Replies 11

Boosts 0

Participants 2

I'm writing a read-only filesystem extension.

I see that the documentation for loadResource(resource:options:replyHandler:) claims that the --rdonly option is supported, which suggests that this should be possible. However, I have never seen this option provided to my filesystem extension, even if I return usableButLimited as a probe result (where it doesn't mount at all - FB19241327) or pass the -r or -o rdonly options to the mount(8) command. Instead I see those options on the volume's activate call.

But other than saving that "readonly" state (which, in my case, is always the case) and then throwing on all write-related calls I'm not sure how to actually mark the filesystem as "read-only." Without such an indicator, the user is still offered the option to do things like trash items in Finder (although of course those operations do not succeed since I throw an EROFS error in the relevant calls).

It also seems like the FSKit extensions that come with the system handle read-only strangely as well. For example, for a FAT32 filesystem, if I mount it like

mount -r -F -t msdos /dev/disk15s1 /tmp/mnt

Then it acts... weirdly. For example, Finder doesn't know that the volume is read-only, and lets me do some operations like making new folders, although they never actually get written to disk. Writing may or may not lead to errors and/or the change just disappearing immediately (or later), which is pretty much what I'm seeing in my own filesystem extension. If I remove the -F option (thus using the kernel extension version of msdos), this doesn't happen.

Are read-only filesystems currently supported by FSKit? The fact that extensions like Apple's own msdos also seem to act weirdly makes me think this is just a current FSKit limitation, although maybe I'm missing something. It's not necessarily a hard blocker given that I can prevent writes from happening in my FSKit module code (or, in my case, just not implement such features at all), but it does make for a strange experience.

(I reported this as FB21068845, although I'm mostly asking here because I'm not 100% sure this is not just me missing something.)

Answered by DTS Engineer in 866824022

Are read-only filesystems currently supported by FSKit?

I think the tricky part here is what "support" here actually means. Let me start by what this actually "does":

pass the -r or -o rdonly options to the mount(8) command.

Passing that to mount should mean that the VFS layer itself is prevented from writing to the device. In FSKit terms, that means "FSBlockDeviceResource.writable" should be false and that all write methods should fail. If either of those behave differently, then that's a HUGE bug that we'd need to fix ASAP.

However, the confusing point here is that mounting a volume "readonly" doesn't necessarily define/change how the file system "presents" itself to the higher level system. That is, strictly speaking, nothing prevents a volume being mounted "readonly"... but that file system itself allowing itself to be fully modifiable.

That might sound a bit strange, but as a concrete example, you could implement a "resettable" file system by using the on-disk file system as the starting structure, routing all writes to secondary storage, and then discarding that storage on unmount. That's just one example, but the broader point is that the data "source" and lower-level VFS system aren't designed to control/constrain the file system that's actually presented to the system.

That leads to here:

Then it acts... weirdly. For example, Finder doesn't know that the volume is read-only, and lets me do some operations like making new folders, although they never actually get written to disk.

So, FYI, this is what happens when the high-level system presents operations as "possible" when it can't actually perform those operations. You can actually get exactly the same thing to happen to any of our kernel drivers, though it takes a KEXT*.

*More specifically, having written a KEXT that did this earlier in my career, if the IOMedia driver (the top level of the IOKit storage stack) "flips" itself to read-only AFTER the volume has mounted, you basically get exactly the behavior you're describing.

In terms of what your extension should do (and why msdosfs is failing), I think the best approach here is to support permissions (FSItemAttributeMode), and then ALWAYS return a configuration for all objects that prevents writing and fail any attempt to modify mode. You could also implement FSVolumeAccessCheckOperations; however, I'm not sure that will actually behave the way you want, and the mode operation is the more important check. The mode configuration means that the system (particularly the Finder) "knows" that the object is read-only before it attempts any operation, which means it never bothers trying.

That leads to here:

It also seems like the FSKit extensions that come with the system handle read-only strangely as well. For example, for a FAT32 filesystem,

It also seems like the FSKit extensions that come with the system handle read-only strangely as well.

Please file a bug on this (specifically, msdosfs's behavior) and post the bug number once it's filed.

What's going on here is that we're returning FSVolumeSupportedCapabilities.doesNotSupportSettingFilePermissions because, in fact, FAT does not support file permissions. However, that also means that it can't "tell" the higher-level system that it's read-only because it's disabled file permissions. In any case, I think it could solve all this by removing "doesNotSupportSettingFilePermissions" on read-only volumes and using exactly the same approach I'm suggesting above... and if it can’t, then this is an edge case we should probably solve. Either way, it's worth a bug.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Boost

Answer 1

DTS Engineer OP

Apple

Nov ’25

Recommended

Are read-only filesystems currently supported by FSKit?

I think the tricky part here is what "support" here actually means. Let me start by what this actually "does":

pass the -r or -o rdonly options to the mount(8) command.

Passing that to mount should mean that the VFS layer itself is prevented from writing to the device. In FSKit terms, that means "FSBlockDeviceResource.writable" should be false and that all write methods should fail. If either of those behave differently, then that's a HUGE bug that we'd need to fix ASAP.

However, the confusing point here is that mounting a volume "readonly" doesn't necessarily define/change how the file system "presents" itself to the higher level system. That is, strictly speaking, nothing prevents a volume being mounted "readonly"... but that file system itself allowing itself to be fully modifiable.

That might sound a bit strange, but as a concrete example, you could implement a "resettable" file system by using the on-disk file system as the starting structure, routing all writes to secondary storage, and then discarding that storage on unmount. That's just one example, but the broader point is that the data "source" and lower-level VFS system aren't designed to control/constrain the file system that's actually presented to the system.

That leads to here:

Then it acts... weirdly. For example, Finder doesn't know that the volume is read-only, and lets me do some operations like making new folders, although they never actually get written to disk.

So, FYI, this is what happens when the high-level system presents operations as "possible" when it can't actually perform those operations. You can actually get exactly the same thing to happen to any of our kernel drivers, though it takes a KEXT*.

*More specifically, having written a KEXT that did this earlier in my career, if the IOMedia driver (the top level of the IOKit storage stack) "flips" itself to read-only AFTER the volume has mounted, you basically get exactly the behavior you're describing.

In terms of what your extension should do (and why msdosfs is failing), I think the best approach here is to support permissions (FSItemAttributeMode), and then ALWAYS return a configuration for all objects that prevents writing and fail any attempt to modify mode. You could also implement FSVolumeAccessCheckOperations; however, I'm not sure that will actually behave the way you want, and the mode operation is the more important check. The mode configuration means that the system (particularly the Finder) "knows" that the object is read-only before it attempts any operation, which means it never bothers trying.

That leads to here:

It also seems like the FSKit extensions that come with the system handle read-only strangely as well. For example, for a FAT32 filesystem,

It also seems like the FSKit extensions that come with the system handle read-only strangely as well.

Please file a bug on this (specifically, msdosfs's behavior) and post the bug number once it's filed.

What's going on here is that we're returning FSVolumeSupportedCapabilities.doesNotSupportSettingFilePermissions because, in fact, FAT does not support file permissions. However, that also means that it can't "tell" the higher-level system that it's read-only because it's disabled file permissions. In any case, I think it could solve all this by removing "doesNotSupportSettingFilePermissions" on read-only volumes and using exactly the same approach I'm suggesting above... and if it can’t, then this is an edge case we should probably solve. Either way, it's worth a bug.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

1

Answer 2

kthchew OP

Nov ’25

I think the tricky part here is what "support" here actually means.

Yeah, I was actually having a bit of trouble trying to think of a title for this post (and for the aforementioned FB21068845).

If either of those behave differently, then that's a HUGE bug that we'd need to fix ASAP.

I didn't actually check those two (write isn't implemented at all in my filesystem). While it would take a little longer to check actual writes I just checked and FSBlockDeviceResource.isWritable is returning the correct value (false) when mounted as read-only, so I think that part is okay, at least!

I think the best approach here is to support permissions (FSItemAttributeMode), and then ALWAYS return a configuration for all objects that prevents writing and fail any attempt to modify mode

I actually have tried this. There's this code in my real project:

if request.isAttributeWanted(.mode) {
    // FIXME: not correct way to enforce read-only file system but does FSKit currently have a better way?
    let useMode = readOnlySystem ? mode.subtracting([.ownerWrite, .groupWrite, .otherWrite]) : mode
    attributes.mode = UInt32(useMode.rawValue)
}

I suppose I can remove that FIXME if that's the case! However, that doesn't remove the relevant options from Finder, and users can still attempt to do these actions. It'll ask me to first authenticate with my admin password, then fail if I do.

Edit: actually, I think the admin prompt is because of the access check. When I modify the sample project I sent into FB21068845 to just return a mode of 0o555 then it still just presents me the option to use those options and doesn't ask for admin, but still returns errors.

The other annoyance I see with this solution is that if I use Finder (or something else like cp) to copy files out of this read-only disk onto some other disk that is read-write (like the internal SSD), then the destination file also has the read-only mode.

Please file a bug on this (specifically, msdosfs's behavior)

Here you go: FB21094219

0

Answer 3

DTS Engineer OP

Apple

Nov ’25

First of all, on the msdosfs issues:

Here you go: FB21094219

I discussed this with the team and, yes, none of this is intentional and it will be fixed in the future.

Moving over to the more general issue, yes, there is a bug here (r.150720858). The way this works in the VFS layer is that the vfs driver is told that the mount request is read-only and the VFS driver can then mark the mount point as read-only (which is what most do). That then propagates back to user space as the "MNT_RDONLY" flag returned through statfs (among other points). Much of the infrastructure for that is already in place; however, there should be an API to allow your driver to directly set it and there appears to be a breakdown in the logic which means that flag isn't getting set automatically. I can't comment on our release schedule, but the issue is well understood and fixing it is a high priority.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

0

Answer 4

kthchew OP

Nov ’25

Thanks for the update! That's useful to know, I'll look out for any changes in future updates.

there appears to be a breakdown in the logic which means that flag isn't getting set automatically

I've also been seeing some other issues related to certain things I set in the FSKit API not showing up in statfs (mainly the volumeStatistics's blockSize and ioSize, FB21106711). I wonder if that's related, or maybe that's a completely separate issue.

0

Answer 5

DTS Engineer OP

Apple

Nov ’25

I've also been seeing some other issues related to certain things I set in the FSKit API not showing up in statfs (mainly the volumeStatistics's blockSize and ioSize, FB21106711).

Actually, I think the size problem might be on your side. So, first off, your test implementation is incomplete. This is the last message logged by your extension:

2025-11-20 12:56:06.891883-0800 FSKitSampleFilesystem: (FSKit) [com.apple.FSKit:default] -[FSModuleVolume(Project) getMaxFileSizeInBits]: Volume does not implement both maxFileSizeInBits and maxFileSize, while one of them must be implemented.

Shortly after which, lifs (the kernel support KEXT for FSKit) logs:

2025-11-20 12:56:06.891917-0800 kernel: (lifs) lmp <private> max_filesize 0x7fffffffffffffff
2025-11-20 12:56:06.891927-0800 kernel: (lifs) lifs_io_strategy_thread: thread <private> starting for mount <private>
2025-11-20 12:56:06.891936-0800 kernel: (lifs) Failed to open block device /dev/disk7, err: 16
2025-11-20 12:56:06.891937-0800 kernel: (lifs) unknown: isssd 0 devblksize 512 devreadsize 0 devwritesize 0 maxreadcnt 8388608 segreadcnt 512 maxwritecnt 8388608 segwritecnt 512 mnt_flags 0x10201000

I'm not sure of what exactly led to that, but I think both of these values:

statfs.f_bsize == 512
statfs.f_ioSize == 1048576

...are the default sizes lifs is falling back "to".

However, there are other issues. Your testing with a disk image, but the disk image block size is 0x200/512, NOT 0x1000/4096. If you’re looking at those values from the volume side, you may get 4096, but that's because the volume changed what stat returned, not because it's actually true.

Have you looked at what the actual block size of that device is and what FSBlockDeviceResource.blockSize is?

FYI, the "actual" physical block size is the value of the IOMedia key "Physical Block Size", which is the value returned by the ioctl() "DKIOCGETPHYSICALBLOCKSIZE". However, adding to the complexity here, APFS and some other volume formats/configurations "split" their implementations across IOKit and the vfs layer. In the case of APFS, that means the top end of its IOMedia implementation actually returns 4096, even if the underlying hardware is 512.

In your particular case, this also means that the stat value you're returning is 4x larger than the underlying FSResource size. That isn't necessarily "wrong", but that's because the VFS layer will basically believe "anything" you tell it, not because that's what you wanted to do.

Finally, on "f_ioSize", it's also possible that we're intentionally capping that size to 1MB. Your volume configuration isn't using kernel offloaded I/O*, which means we'll be copying all data in and out of the kernel. I believe that's transfer buffer is capped at 1 MB (which is fairly reasonable) and we may be propagating that limitation "out" through f_ioSize to discourage large writes which force us to hold additional memory.

*To be clear, I'm not sure that switching to kernel offloaded I/O would remove the limitation, but that’s the case where we COULD remove the restriction.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

0

Answer 6

kthchew OP

Nov ’25

Volume does not implement both maxFileSizeInBits and maxFileSize, while one of them must be implemented.

Implementing this didn't seem to change the result. But I definitely do need to fix the sample FSKit project I've been copying as a base for all of these bug reports, then!

Have you looked at what the actual block size of that device is and what FSBlockDeviceResource.blockSize is?

Both physicalBlockSize and FSBlockDeviceResource.blockSize are 512 on the disk image.

However, there are other issues. Your testing with a disk image, but the disk image block size is 0x200/512, NOT 0x1000/4096. If you’re looking at those values from the volume side, you may get 4096, but that's because the volume changed what stat returned, not because it's actually true.

...

In your particular case, this also means that the stat value you're returning is 4x larger than the underlying FSResource size. That isn't necessarily "wrong", but that's because the VFS layer will basically believe "anything" you tell it, not because that's what you wanted to do.

So, as context for where 4096 comes from, in my real extension I read the logical block size that the filesystem uses from a value stored on the superblock on disk. Most of the time that value is 4096 bytes, which is why I chose that value for the sample project.

My assumption is that this value is supposed to be set to the logical block size that my file system uses? In which case that would be this larger value, not the physical block size on the device.

I suppose a lot of my confusion lies from how FSBlockDeviceResource.blockSize is described as "The logical block size, the size of data blocks used by the file system," which I understand to mean that in my case, I should be able to set it to the value stored in the superblock (let's say that's 4096). But since I can't set that directly, I figured that setting the volume statistics' blockSize would report that higher value at the volume level in statfs then feed back to that value, but that didn't seem to happen since it remained at 512.

Finally, on "f_ioSize", it's also possible that we're intentionally capping that size to 1MB.

Hmm, but that would be a strange reason in this case, wouldn't it? In the sample project I set

stats.ioSize = 524_288

which is half of the default value. So it should be within that limit, still, unless you mean that it's just always kept at 1 MB. Although I have seen the same behavior (ioSize always at 1MB even if set to a lower value) in my real project that does implement kernel offloaded IO, so it seems a bit confusing if that value can be "set" but doesn't actually do anything.

0

Answer 7

DTS Engineer OP

Apple

Dec ’25

I suppose a lot of my confusion lies in how FSBlockDeviceResource.blockSize is described as "The logical block size, the size of data blocks used by the file system," which I understand to mean that in my case, I should be able to set it to the value stored in the superblock (let's say that's 4096).

So, the more useful description is what's in the discussion:

FSBlockDeviceResource.physicalBlockSize:

"This is equivalent to the DKIOCGETPHYSICALBLOCKSIZE device parameter."

Conceptually, this means the smallest transfer the "device" is physically capable of transferring. In high-level API terms, all I/O done to the rdev ("raw dev node") MUST be an even multiple of this value, otherwise the I/O will immediately fail without ever reaching the hardware driver.

FSBlockDeviceResource.blockSize:

"This is equivalent to the DKIOCGETBLOCKSIZE device parameter."

This value is the device’s "preferred" I/O size, meaning the size it would "like" to have even if it COULD handle smaller requests.

Two points to understand here:

Much of the time these two values will be equal, but that's not guaranteed. For example, some RAID drivers increase DKIOCGETBLOCKSIZE as a way of encouraging the system to send larger I/O requests so that they can "spread" more I/O across devices.
Above I said "conceptually" because, strictly speaking, this only controls what submitted a given "layer" of the storage stack accepts. So a higher level driver can have a higher value than a lower level or, theoretically, even a lower value (though that would be a pain to implement).

Returning to here:

I suppose a lot of my confusion lies in how FSBlockDeviceResource.blockSize is described as "The logical block size, the size of data blocks used by the file system,"...

The section is basically just confusing and wrong (r.165643691). There isn't any "file system" at this level of the system and, as it happens, "Logical Block Size" is also defined by the storage stack but is a totally different value and NOT what's being returned.

...which I understand to mean that in my case, I should be able to set it to the value stored in the superblock (let's say that's 4096).

You won't be setting it at all. FSBlockDeviceResource is simply telling you the reality of the device you're talking to. Strictly speaking, that doesn't have to have ANY connection with what you tell the "rest" of the system.

My assumption is that this value is supposed to be set to the logical block size that my file system uses? In which case that would be this larger value, not the physical block size on the device.

Yes, that's correct. More specifically, it's common for file systems to have a larger allocation block size (meaning, how they track their own block usage) than the physical media they’re on. That means, in the standard case, that FSStatFSResult.blockSize will be larger than FSBlockDeviceResource.physicalBlockSize and that FSStatFSResult.totalBlocks will be smaller than FSBlockDeviceResource.blockCount.

However, it isn't necessarily the case that:

blockSize x totalBlocks == physicalBlockSize x blockCount

It often WILL be at least close to it, but nothing requires it. As the simplest example, if your block size is 4096 and your partition has an "extra" block, then the math just won't work. You can either "hide" that extra block by incrementing usedBlocks and totalBlocks or you can "lose" that extra block by returning the correct values, but either way you'll either be slightly too small or slightly too large. All you can do is pick one.

More fundamentally, I think it's critical to understand that these values aren't necessarily tied to fundamental "truth". As a real-world example, copy-on-write (COW) file systems like APFS work by pushing "all" changes out as writes which are then applied, but that also means that it's possible to put the file system into a state where it can no longer be modified. Modification would require free space, so if all the space is gone modification becomes impossible... and that INCLUDES deleting files.

The simplest solution to this is exactly what APFS does— set aside enough space that you'll "always" be able to delete files, then increase your used count so that the volume is "full" before you ACTUALLY run out of "all" space.

Moving to here:

Finally, on "f_ioSize", it's also possible that we're intentionally capping that size to 1MB.

Having taken another look at this post Thanksgiving, let me rephrase that as "We are intentionally capping f_ioSize at 1MB" (r.158671366).

Hmm, but that would be a strange reason in this case, wouldn't it?

Yes and no. From what I can tell, we did significant testing on this and basically came to the conclusion that 1MB just worked "better" than any other values, primarily due to mach IPC details.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

1

Answer 8

kthchew OP

Dec ’25

Thanks! That clears my confusion up. Right now I'll just be focusing on the layers relevant to an FSKit module. I wasn't even really thinking of the case where reported total bytes != physical total bytes, but that example makes sense.

As a real-world example, copy-on-write (COW) file systems like APFS work by pushing "all" changes out as writes which are then applied, but that also means that it's possible to put the file system into a state where it can no longer be modified. Modification would require free space, so if all the space is gone modification becomes impossible... and that INCLUDES deleting files.

The simplest solution to this is exactly what APFS does— set aside enough space that you'll "always" be able to delete files, then increase your used count so that the volume is "full" before you ACTUALLY run out of "all" space.

Huh. Out of curiosity, did APFS always do that to prevent that issue? I vaguely remember seeing an issue like that at a user level years ago (although, I don't really have any evidence to really prove that that was the specific issue that happened, so maybe not).

In any case, going back to the original issue,

I wonder if that's related, or maybe that's a completely separate issue.

I think we can file this under "completely separate."

0

Answer 9

DTS Engineer OP

Apple

Dec ’25

Huh. Out of curiosity, did APFS always do that to prevent that issue?

No, at least not entirely. I don't remember if it was an issue on macOS (where booting support wasn't added until later), but there was a short interval in iOS 8 where it was possible to "completely" fill an iOS device.

However, the end result of that wasn't quite as bad as I made it sound (or it theoretically could be). The nature of COW also means that in a "normal" I/O flow, the I/O request that uses the "last" available space is effectively "guaranteed" to be the writes for the next/new data that's going to be written. So what actually ended up happening was:

The kernel panicked, since APFS couldn't really "do" anything else.
The file system was left in a "dirty" state (since it didn't unmount).
The fsck prior to remount "found" that pending data and cleared it (since that's all it could really do).

...which then freed up enough space for the file system to be functional again, at least enough that you could delete files and clear the issue.

Theoretically, it might be possible to "line up" volume state and I/O pattern in JUST the right way that the volume would unrecoverably deadlock, but I don't see it ever happening by "accident", particularly on a "live" system volume*. Also, keep in mind that that system had repeatedly warned the user that it was short of storage and that serious problems would result, so the problem didn't really "sneak up" on the user.

*The system volume has enough simultaneous activity that you can't really control what the file system is doing at any given instant.

Also, one follow-up comment for the "haters" out there:

The simplest solution to this is exactly what APFS does—set aside enough space that you'll "always" be able to delete files, then increase your used count so that the volume is "full" before you ACTUALLY run out of "all" space.

The common reaction to something like this is that this is a "waste" of space and that file systems should let you use "all" of your drive’s space. That reaction comes from a very naive and limited understanding of how file systems actually work. ALL file system design involves making tradeoffs between performance, storage efficiency, safety, and functionality. As the most obvious example, if your goal was to ACTUALLY store as much data as possible, that last thing you'd do is divide your storage into fixed-sized chunks and then "waste" all of the unused space in those chunks... but that is in fact how basically "all" file systems work.

More to the point, older safety features like journalling are basically doing exactly the same thing— setting aside storage capacity the file system uses to protect its own integrity.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

1

Answer 10

kthchew OP

Dec ’25

Makes sense, so seems like what I saw back then [1] almost certainly wasn't at the APFS layer, then.

one follow-up comment for the "haters" out there:

Ha, well I would hope that someone looking at an FSKit-related question on the forums would know enough about file systems to understand why that would make sense, though I suppose you never know ;) COW is definitely saving me more space on my machine than whatever is reserved for that purpose.

[1] If you're curious it was an old Apple Watch I had that got stuck in a boot loop after restarting it to try to fix some "crashy app" behavior, and IIRC there were low storage alerts shortly before that happened. Definitely way after the iOS 8 days.

0

Answer 11

DTS Engineer OP

Apple

Dec ’25

[1] If you're curious it was an old Apple Watch I had that got stuck in a boot loop after restarting it to try to fix some "crashy app" behavior, and IIRC there were low storage alerts shortly before that happened. Definitely way after the iOS 8 days.

Hmm... Maybe? Turns out my memory was wrong and APFS was actually adopted in iOS 10.3, not 8.3. watchOS did adopt it before that, so it's possible you could have hit something then.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

0