Are you attempting to materialize the dataless file of a snapshot? That's a weird edge case where I'm honestly not sure what would happen.
True. I was sure I saw the same ETIMEDOUT behavior whether or not I was using an APFS snapshot, but I can't seem to reproduce that anymore. We'll stop attempting to materialize dataless files in APFS snapshots.
Yeah... There's a WHOLE lot of "weird" here that I'm not sure how the system would unwind. The most obvious issue is that the snapshot doesn't have any writable storage it can materialize "too", but the other issue is that this entire mechanism relies on FileProviders registering the folders they manage... but the snapshot isn't going to mount at a "target" the file provider would have recognized as "their's".
In terms of blaming snapshots, I think the key point is here:
1-2 milliseconds when calling read() on a file in the APFS snapshot. Another indicator that using the APFS snapshot here is a bad idea.
...which is WAY too fast for any real "processing" to have occurred. I'm not going to try and track down the true failure point, but this is almost certainly caused by the VFS driver falling through to this error, not a specific "timeout" event.
Note that while snapshots were involved here, I think you'd get similar failures if:
-
The source volume was a read-only volume (snapshots are actually a special case of this more general case).
-
The source volume was mounted as a "secondary" volume on an otherwise unrelated device— for example, a system volume mounted via TDM (Target Disk Mode).
-
As a general version of #2, the owning FileProvider "doesn't work". WHY it's failing isn't something you'll have great visibility into, but it could be anything from the network not being available to the original provider being completely deleted.
-
(possibly) The source was mounted through SMB or some other "intermediary".
All of these are cases where the system either can't materialize (because it doesn't have writable storage) or can't associate to the original provider.
The low-level calls are ultimately a compatible layer, not a full solution.
OK. We'll look into using
NSFileCoordinator coordinateReadingItemAtURL:options:error:byAccessor:
to materialize the dataless file. We'll still detect SF_DATALESS in st_flags returned from stat(), since TN3150: Getting ready for dataless files | Apple Developer Documentation suggests using it, unless that's now an outdated approach?
So, first off, if you're going to handle "backing up" file provider-based storage, then I think you need to do one or both of two things:
-
"Accept" whatever data you got as "the data" and back it up normally, "ignoring" any dataless files. If you can copy the dataless object "as is," then that's great, but do NOT attempt to materialize any dataless files you encounter.
-
Treat cloud storage data as a special case which are individually backed up and managed as independent app features that's separate from your "normal" snapshot-based backup.
The problem here isn't just the immediate materialization issue, it's that the nature of cloud storage means that you can't create an atomic backup in the same way you can backup a local storage device. You're going to end up downloading individual files over a period of time, which means your backup isn't a specific point in time, but is instead a collection of individual files retrieved at unrelated time points. That isn't necessarily "wrong", but "mixing" it with the user’s normal backup may not be a great idea.
Also, keep in mind that the storage/bandwidth cost of #2 is potentially VERY large, often larger than the user's storage can fully accommodate. That doesn't mean a feature like #2 isn't useful, but it probably needs to be set up and managed through its own interface so you can account for things like excluding directories.
Finally, on this point:
We'll still detect SF_DATALESS in st_flags returned from stat(), since TN3150: Getting ready for dataless files | Apple Developer Documentation suggests using it, unless that's now an outdated approach?
I think this is a case of where it depends on what you want to do. If your goal is simply to detect and avoid/special case these files (case #1 above) during "bulk" iteration, then I think it works great. However, if you're actually trying to download/manage these files, then I think you need to move to the higher level API where you have better visibility into what's actually happening and the access model is better suited to long networking delays.
__
Kevin Elliott
DTS Engineer, CoreOS/Hardware