Backing up dataless files

Our backup app runs as a LaunchDaemon, has APFS-snapshot entitlement, and attempts to read and back up dataless files/directories by calling setiopolicy_np(IOPOL_TYPE_VFS_MATERIALIZE_DATALESS_FILES, IOPOL_SCOPE_THREAD, IOPOL_MATERIALIZE_DATALESS_FILES_ON) before reading files/directories. This worked for a while but lately has been producing many ETIMEDOUT errors for our users, especially for iCloud files but sometimes for FileProvider files e.g. Dropbox. We've attempted to wait up to 5 minutes and retry, but the ETIMEDOUT error persists. In many cases the errors stop hours or days later. Some users report that clicking the icon in the Finder and telling the Finder to "download" the file/directory (for iCloud files) successfully downloads the file data, and then backups proceed without error.

Why is the user able to easily materialize dataless files but our app isn't?

What is the correct approach for backing up these files/directories?

What APIs can we use to determine the actual issue with a given dataless file/directory that's giving us a read error?

Our backup app runs as a LaunchDaemon, has APFS-snapshot entitlement, and attempts to read and back up dataless files/directories by calling setiopolicy_np(IOPOL_TYPE_VFS_MATERIALIZE_DATALESS_FILES, IOPOL_SCOPE_THREAD, IOPOL_MATERIALIZE_DATALESS_FILES_ON) before reading files/directories. This worked for a while but lately has been producing many ETIMEDOUT errors for our users, especially for iCloud files but sometimes for FileProvider files, e.g. Dropbox. We've attempted to wait up to 5 minutes and retry, but the ETIMEDOUT error persists. In many cases, the errors stop hours or days later.

Are you attempting to materialize the dataless file of a snapshot? That's a weird edge case where I'm honestly not sure what would happen. Do you have any more information about the files this occurred on? Size, contents, etc.? Also, how long does it take for you to receive ETIMEDOUT?

Having said that...

Why is the user able to easily materialize dataless files but our app isn't?

What is the correct approach for backing up these files/directories?

...I think the answer to both of these questions is to use FileCoordination to initiate and manage the download. I'm not sure what the exact circumstances would be, but I'd expect there to be cases where the low-level calls failed and FileProvider succeeded. The low-level calls are ultimately a compatible layer, not a full solution, so I'm sure there are cases where they fail the open simply because the alternative (blocking for the full download) wasn't really practical/viable.

What APIs can we use to determine the actual issue with a given dataless file/directory that's giving us a read error?

I'm not sure there's an API that will give you great visibility into exactly what the error is/was, but there are new NSFileManager APIs that will let you more directly manage the download process.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Are you attempting to materialize the dataless file of a snapshot? That's a weird edge case where I'm honestly not sure what would happen.

True. I was sure I saw the same ETIMEDOUT behavior whether or not I was using an APFS snapshot, but I can't seem to reproduce that anymore. We'll stop attempting to materialize dataless files in APFS snapshots.

Do you have any more information about the files this occurred on? Size, contents, etc.? Also, how long does it take for you to receive ETIMEDOUT?

The one case I can reliably reproduce at the moment is a backup of WhatsApp data to "~/Library/Mobile Documents/57T9237FN3~net~whatsapp~WhatsApp/Accounts/16173068743/backup". 19 files, 189MB total.

Also, how long does it take for you to receive ETIMEDOUT?

1-2 milliseconds when calling read() on a file in the APFS snapshot. Another indicator that using the APFS snapshot here is a bad idea.

The low-level calls are ultimately a compatible layer, not a full solution

OK. We'll look into using

NSFileCoordinator coordinateReadingItemAtURL:options:error:byAccessor:

to materialize the dataless file. We'll still detect SF_DATALESS in st_flags returned from stat(), since TN3150: Getting ready for dataless files | Apple Developer Documentation suggests using it, unless that's now an outdated approach?

Are you attempting to materialize the dataless file of a snapshot? That's a weird edge case where I'm honestly not sure what would happen.

True. I was sure I saw the same ETIMEDOUT behavior whether or not I was using an APFS snapshot, but I can't seem to reproduce that anymore. We'll stop attempting to materialize dataless files in APFS snapshots.

Yeah... There's a WHOLE lot of "weird" here that I'm not sure how the system would unwind. The most obvious issue is that the snapshot doesn't have any writable storage it can materialize "too", but the other issue is that this entire mechanism relies on FileProviders registering the folders they manage... but the snapshot isn't going to mount at a "target" the file provider would have recognized as "their's".

In terms of blaming snapshots, I think the key point is here:

1-2 milliseconds when calling read() on a file in the APFS snapshot. Another indicator that using the APFS snapshot here is a bad idea.

...which is WAY too fast for any real "processing" to have occurred. I'm not going to try and track down the true failure point, but this is almost certainly caused by the VFS driver falling through to this error, not a specific "timeout" event.

Note that while snapshots were involved here, I think you'd get similar failures if:

  1. The source volume was a read-only volume (snapshots are actually a special case of this more general case).

  2. The source volume was mounted as a "secondary" volume on an otherwise unrelated device— for example, a system volume mounted via TDM (Target Disk Mode).

  3. As a general version of #2, the owning FileProvider "doesn't work". WHY it's failing isn't something you'll have great visibility into, but it could be anything from the network not being available to the original provider being completely deleted.

  4. (possibly) The source was mounted through SMB or some other "intermediary".

All of these are cases where the system either can't materialize (because it doesn't have writable storage) or can't associate to the original provider.

The low-level calls are ultimately a compatible layer, not a full solution.

OK. We'll look into using

NSFileCoordinator coordinateReadingItemAtURL:options:error:byAccessor: to materialize the dataless file. We'll still detect SF_DATALESS in st_flags returned from stat(), since TN3150: Getting ready for dataless files | Apple Developer Documentation suggests using it, unless that's now an outdated approach?

So, first off, if you're going to handle "backing up" file provider-based storage, then I think you need to do one or both of two things:

  1. "Accept" whatever data you got as "the data" and back it up normally, "ignoring" any dataless files. If you can copy the dataless object "as is," then that's great, but do NOT attempt to materialize any dataless files you encounter.

  2. Treat cloud storage data as a special case which are individually backed up and managed as independent app features that's separate from your "normal" snapshot-based backup.

The problem here isn't just the immediate materialization issue, it's that the nature of cloud storage means that you can't create an atomic backup in the same way you can backup a local storage device. You're going to end up downloading individual files over a period of time, which means your backup isn't a specific point in time, but is instead a collection of individual files retrieved at unrelated time points. That isn't necessarily "wrong", but "mixing" it with the user’s normal backup may not be a great idea.

Also, keep in mind that the storage/bandwidth cost of #2 is potentially VERY large, often larger than the user's storage can fully accommodate. That doesn't mean a feature like #2 isn't useful, but it probably needs to be set up and managed through its own interface so you can account for things like excluding directories.

Finally, on this point:

We'll still detect SF_DATALESS in st_flags returned from stat(), since TN3150: Getting ready for dataless files | Apple Developer Documentation suggests using it, unless that's now an outdated approach?

I think this is a case of where it depends on what you want to do. If your goal is simply to detect and avoid/special case these files (case #1 above) during "bulk" iteration, then I think it works great. However, if you're actually trying to download/manage these files, then I think you need to move to the higher level API where you have better visibility into what's actually happening and the access model is better suited to long networking delays.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Backing up dataless files
 
 
Q