FileManager.replaceItemAt(_:withItemAt:) fails sporadically on ubiquitous items

I’m encountering a strange, sporadic error in FileManager.replaceItemAt(_:withItemAt:) when trying to update files that happen to be stored in cloud containers such as iCloud Drive or Dropbox. Here’s my setup:

  • I have an NSDocument-based app which uses a zip file format (although the error can be reproduced using any kind of file).

  • In my NSDocument.writeToURL: implementation, I do the following:

  1. Create a temp folder using FileManager.url(for: .itemReplacementDirectory, in: .userDomainMask, appropriateFor: fileURL, create: true).

  2. Copy the original zip file into the temp directory.

  3. Update the zip file in the temp directory.

  4. Move the updated zip file into place by moving it from the temp directory to the original location using FileManager.replaceItemAt(_:withItemAt:).

This all works perfectly - most of the time. However, very occasionally I receive a save error caused by replaceItemAt(_withItemAt:) failing. Saving can work fine for hundreds of times, but then, once in a while, I’ll receive an “operation not permitted” error in replaceItemAt.

I have narrowed the issue down and found that it only occurs when the original file is in a cloud container - when FileManager.isUbiquitousItem(at:) returns true for the original fileURL I am trying to replace. (e.g. Because the user has placed the file in iCloud Drive.) Although strangely, the permissions issue seems to be with the temp file rather than with the original (if I try copying or deleting the temp file after this error occurs, I’m not allowed; I am allowed to delete the original though - not that I’d want to of course).

Here’s an example of the error thrown by replaceItemAt:

Error Domain=NSCocoaErrorDomain Code=513 "You don’t have permission to save the file “test-file.txt” in the folder “Dropbox”." UserInfo={NSFileBackupItemLeftBehindLocationKey=file:///var/folders/mt/0snrr8fx7270rm0b14ll5k500000gn/T/TemporaryItems/NSIRD_TempFolderBug_y3UvzP/test-file.txt, NSFileOriginalItemLocationKey=file:///var/folders/mt/0snrr8fx7270rm0b14ll5k500000gn/T/TemporaryItems/NSIRD_TempFolderBug_y3UvzP/test-file.txt, NSURL=file:///Users/username/Library/CloudStorage/Dropbox/test-file.txt, NSFileNewItemLocationKey=file:///Users/username/Library/CloudStorage/Dropbox/test-file.txt, NSUnderlyingError=0xb1e22ff90 {Error Domain=NSCocoaErrorDomain Code=513 "You don’t have permission to save the file “test-file.txt” in the folder “NSIRD_TempFolderBug_y3UvzP”." UserInfo={NSURL=file:///var/folders/mt/0snrr8fx7270rm0b14ll5k500000gn/T/TemporaryItems/NSIRD_TempFolderBug_y3UvzP/test-file.txt, NSFilePath=/var/folders/mt/0snrr8fx7270rm0b14ll5k500000gn/T/TemporaryItems/NSIRD_TempFolderBug_y3UvzP/test-file.txt, NSUnderlyingError=0xb1e22ffc0 {Error Domain=NSPOSIXErrorDomain Code=1 "Operation not permitted"}}}}

And here’s some very simple sample code that reproduces the issue in a test app:

    // Ask user to choose this via a save panel.
    var savingURL: URL? {
        didSet {
            setUpSpamSave()
        }
    }
    
    var spamSaveTimer: Timer?
    
    // Set up a timer to save the file every 0.2 seconds so that we can see the sporadic save problem quickly.
    func setUpSpamSave() {
        spamSaveTimer?.invalidate()
        let timer = Timer(fire: Date(), interval: 0.2, repeats: true) { [weak self] _ in
            self?.spamSave()
        }
        spamSaveTimer = timer
        RunLoop.main.add(timer, forMode: .default)
    }
    
    func spamSave() {
        guard let savingURL else { return }
        
        let fileManager = FileManager.default
        
        // Create a new file in a temp folder.
        guard let replacementDirURL = try? fileManager.url(for: .itemReplacementDirectory, in: .userDomainMask, appropriateFor: savingURL, create: true) else {
            return
        }
        let tempURL = replacementDirURL.appendingPathComponent(savingURL.lastPathComponent)
        guard (try? "Dummy text".write(to: tempURL, atomically: false, encoding: .utf8)) != nil else {
            return
        }
        
        do {
            // Use replaceItemAt to safely move the new file into place.
            _ = try fileManager.replaceItemAt(savingURL, withItemAt: tempURL)
            print("save succeeded!")
            
            try? fileManager.removeItem(at: replacementDirURL) // Clean up.
            
        } catch {
            print("save failed with error: \(error)")
            // Note: if we try to remove replaceDirURL here or do anything with tempURL we will be refused permission.
            NSAlert(error: error).runModal()
        }
    }

If you run this code and set savingURL to a location in a non-cloud container such as your ~/Documents directory, it will run forever, resaving the file over and over again without any problems.

But if you run the code and set savingURL to a location in a cloud container, such as in an iCloud Drive folder, it will work fine for a while, but after a few minutes - after maybe 100 saves, maybe 500 - it will throw a permissions error in replaceItemAt.

(Note that my real app has all the save code wrapped in file coordination via NSDocument methods, so I don’t believe file coordination to be the problem.)

What am I doing wrong here? How do I avoid this error? Thanks in advance for any suggestions.

Answered by DTS Engineer in 878034022

I hadn’t filed a bug report yet because I had assumed it was something I was doing wrong, given that using replaceItem and a temporary folder is presumably a common pattern. I’ll file a report tomorrow - I’m following the iCloud Drive profile instructions you linked to and am now waiting the 24 hours they say I need to wait before I can get the sysdiagnose. Once I have that, I’ll file the report along with a sample project.

Perfect, thank you.

...using replaceItem and a temporary folder is presumably a common pattern

I didn't get into it above, but there very likely are some nuances/details involved that are a contributing factor. For example, I suspect this doesn't happen if you start with a security-scoped bookmark, which you resolve to a bookmark before each save. I think NSDocument's default implementation also writes out a new file before each save, which means it's always starting with a "new" URL. That doesn't mean there's anything "wrong" with what you're doing, but that's probably why this isn't more widespread.

With a bit of refactoring, I probably could retry the save. In my app, this is all done inside my NSDocument’s writeToURL method. I use my own drop-in replacement for FileWrapper (you helped me with some of the finer points of FileWrapper a few years ago).

Fabulous! Always good to hear when my ideas have worked out!

A potential problem with the re-save approach is that my save usually works by copying the zip file at the original location to a temporary location, updating it there, and then moving it into place using replaceItemAt.

Just to clarify, are you:

a) Copying the file once, modifying it over time, then copying that file back for each save.

b) Copying the file prior to each save operation.

I suspect you're doing "a" (and it's probably what I would do), but if you're doing "b”, then that changes things a bit.

Assuming you're starting with "a", then my intuition would be to:

  1. Commit your change to your temp file.

  2. Clone that file into a new temp file.

  3. Use that new temp file as the source for your save.

There are a few advantages to this:

  • It may avoid the immediate issue here, since you'll always be replacing with a "new" file object.

  • If anything goes wrong, you can retry the save by restarting with a clean clone.

  • It can be a useful architecture to build on for other edge cases.

Expanding on that last point, one of the issues you can run into is cases where the files involved are large and the final save destination is VERY slow (like an SMB drive). Putting that in concrete terms, let’s say you want to autosave every 1s, but the save destination is going to take 5s-10s to complete the save. Here is one way to handle that:

  • Your app copies from the destination to your local storage. This becomes your "working" copy that you modify.

  • Your app autosaves to local storage every ~1s.

  • Your app pushes that initial save data to the final target.

  • Every time the final save finishes, it starts a new save using the most recent save.

In other words, your app can rely on its "standard" set of 1s autosaves, but you're actually only saving to the final target every 5-10s (as the previous save finishes).

One final point here— if you're working with package and large file counts, directory cloning may provide a significant performance benefit. The man pages warn against cloning directories, but this forum post explains what the actual risks are and when it's a reasonable option.

I wonder, though— given that the original file has in fact been replaced by the temp file despite the error, can I not just check for this and ignore the error if the file seems to have been replaced after all? E.g.: ... Is there something wrong with this approach?

That is a REALLY tough call. The problem here is that your visibility into the exact cause of the error is limited, so while it's certainly safe in the particular case, it's hard to be sure that you're ACTUALLY dealing with this exact case. Even worse, my concern here would be the proliferation of edge cases, both in terms of what's out there "today" and in terms of future configurations/changes.

My own instincts would be to redo the entire save, but if you want to do this, I would do two things:

  1. Look at the NSError object you're getting back "in detail" so you can identify as "specific" a failure as possible. Notably, I think you can use NSUnderlyingErrorKey to pull an NSError object for the lower-level error, so I'd look at that object (and possibly any underlying NSError), not just catching the "fail".

  2. ...then an ID check to confirm you're "right" about the failure. I'd even check things like file size and possibly times so that you're "sure" everything looks the way you'd expect. Most of that metadata is collected with a single syscall, so it gets you a little extra safety without actually making things slower.

The goal is to fingerprint a particular failure you consider "safe", not just trusting the ID change. Having said that, I'd also be tempted to expand the check in #2 and run it against all files, not just the cases where you got an error. Done properly, there’s minimal performance cost, and there are worse things an app can do than double-checking its saves.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

An update on this weird behaviour:

I have discovered that when replaceItem fails in this circumstance, the temp file has in fact been moved into place correctly and has replaced the original file. But when I get the error, the old original file has taken the place of the old temp file and it's that which cannot be removed.

I have tested this by checking both the content and the fileResourceIdentifier of the original file and the temp file, and logging them before and after the error. After the error they are swapped.

I’m encountering a strange, sporadic error in FileManager.replaceItemAt(_:withItemAt:) when trying to update files that happen to be stored in cloud containers such as iCloud Drive or Dropbox.

Have you filed a bug on this and, if so, what's the bug number? As part of that bug, I'd suggest installing the "iCloud Drive" profile, reproducing the issue a few times, then uploading a sysdiagnose of the failure. See the profile installation instructions for the full details of that process.

And here’s some very simple sample code that reproduces the issue in a test app:

Thank you for that. I got your code up and running in a test app and was able to replicate the problem fairly easily. As to WHY it's happening, that's unclear. From the console log, it appears that the entire replace sequence worked fine but the sandbox then rejected access to the temporary file as the kernel was trying to cleanup post-swap. Weirdly, it doesn't appear to be blocking actual access to the file (continuing after the failure worked fine), so I think the issue is at least partially tied to the very specific circumstances the swap creates.

What am I doing wrong here?

I'm not sure you're doing anything wrong, as I think this is a bug.

How do I avoid this error?

Have you tried retrying the save? That appears to work in my testing, though it may not be a workable solution in your case. Beyond that, I'd need a better understanding of exactly how you're interacting with the files and what your full requirements are.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Many thanks for the reply and information, Kevin, much appreciated.

Have you filed a bug on this and, if so, what's the bug number? As part of that bug, I'd suggest installing the "iCloud Drive" profile, reproducing the issue a few times, then uploading a sysdiagnose of the failure. See the profile installation instructions for the full details of that process.

I hadn’t filed a bug report yet because I had assumed it was something I was doing wrong given that using replaceItem and a temporary folder is presumably a common pattern. I’ll file a report tomorrow - I’m following the iCloud Drive profile instructions you linked to and am now waiting the 24 hours they say I need to wait before I can get the sysdiagnose. Once I have that I’ll file the report along with a sample project.

Have you tried retrying the save? That appears to work in my testing, though it may not be a workable solution in your case. Beyond that, I'd need a better understanding of exactly how you're interacting with the files and what your full requirements are.

With a bit of refactoring I probably could retry the save. In my app this is all done inside my NSDocument’s writeToURL method. I use my own drop-in replacement for FileWrapper (you helped me with some of the finer points of FileWrapper a few years ago) that incrementally writes changes to a zip file using Libzip, which supports incremental saves on copy-on-write systems such as APFS.

A potential problem with the re-save approach is that my save usually works by copying the zip file at the original location to a temporary location, updating it there, and then moving it into place using replaceItemAt. After this particular replaceItemAt error, however, the original file has in fact been updated despite the error (the error being on the old version of the file which is now in the temporary directory). So if I re-save by making a copy of that and try updating again, I could potentially mess up the file by trying to save into it stuff that has actually already been done. (However, I do keep a snapshot of the older archive around in case of problems, so I might be able to work around this problem using that.)

I wonder, though - given that the original file has in fact been replaced by the temp file despite the error, can I not just check for this and ignore the error if the file seems to have been replaced after all? E.g.:

  1. Before replacement, record the file resource ID of the temp file.

  2. Use replaceItemAt(originalURL, withItemAt: tempURL).

  3. If there’s an error, get the file resource ID for the file at the intended saving location and compare it against the ID I recorded in (1). If they are the same, I know the replacement has succeeded despite the error. In this case, I can just try to delete the temporary folder and move on.

  4. If the file IDs of the current user file and the temp file from before replace don’t match or couldn’t be got, attempt a re-save.

Is there something wrong with this approach? (I’ve attached some sample code below demonstrating how this might work.)

Many thanks, Keith

// Get a temporary folder appropriate for creating the new file in.
let replacementDirURL = try fileManager.url(for: .itemReplacementDirectory, in: .userDomainMask, appropriateFor: savingURL, create: true)

// Create the new file at the temporary location.
let tempURL = replacementDirURL.appendingPathComponent(savingURL.lastPathComponent)
try createNewContentAt(url: tempURL)

// Record the file resource ID of the temp file we created.
let tempFileID = (try? tempURL.resourceValues(forKeys: [.fileResourceIdentifierKey]))?.fileResourceIdentifier

// Now try to move the file into place.
do {
    // Use replaceItemAt to safely replace the original file with the updated file we created at the temp location.
    _ = try fileManager.replaceItemAt(savingURL, withItemAt: tempURL)

    // Clean up.
    try? fileManager.removeItem(at: replacementDirURL)
    
} catch {
    // Check to see if the original file was in fact replaced despite the error.
    if let tempFileID,
       let savingFileID = (try? savingURL.resourceValues(forKeys: [.fileResourceIdentifierKey]))?.fileResourceIdentifier,
       tempFileID.isEqual(savingFileID) {
        
        // If so, just try to remove the temp dir and move on.
        try? fileManager.removeItem(at: replacementDirURL)
                
    } else {
        // If we got here, replace really did fail and we need to handle it.
                
        // We should do some more work and try to resave here before throwing an error.
                
        throw error
    }
}

Just to add that I have now filed the bug as #FB22107069.

I hadn’t filed a bug report yet because I had assumed it was something I was doing wrong, given that using replaceItem and a temporary folder is presumably a common pattern. I’ll file a report tomorrow - I’m following the iCloud Drive profile instructions you linked to and am now waiting the 24 hours they say I need to wait before I can get the sysdiagnose. Once I have that, I’ll file the report along with a sample project.

Perfect, thank you.

...using replaceItem and a temporary folder is presumably a common pattern

I didn't get into it above, but there very likely are some nuances/details involved that are a contributing factor. For example, I suspect this doesn't happen if you start with a security-scoped bookmark, which you resolve to a bookmark before each save. I think NSDocument's default implementation also writes out a new file before each save, which means it's always starting with a "new" URL. That doesn't mean there's anything "wrong" with what you're doing, but that's probably why this isn't more widespread.

With a bit of refactoring, I probably could retry the save. In my app, this is all done inside my NSDocument’s writeToURL method. I use my own drop-in replacement for FileWrapper (you helped me with some of the finer points of FileWrapper a few years ago).

Fabulous! Always good to hear when my ideas have worked out!

A potential problem with the re-save approach is that my save usually works by copying the zip file at the original location to a temporary location, updating it there, and then moving it into place using replaceItemAt.

Just to clarify, are you:

a) Copying the file once, modifying it over time, then copying that file back for each save.

b) Copying the file prior to each save operation.

I suspect you're doing "a" (and it's probably what I would do), but if you're doing "b”, then that changes things a bit.

Assuming you're starting with "a", then my intuition would be to:

  1. Commit your change to your temp file.

  2. Clone that file into a new temp file.

  3. Use that new temp file as the source for your save.

There are a few advantages to this:

  • It may avoid the immediate issue here, since you'll always be replacing with a "new" file object.

  • If anything goes wrong, you can retry the save by restarting with a clean clone.

  • It can be a useful architecture to build on for other edge cases.

Expanding on that last point, one of the issues you can run into is cases where the files involved are large and the final save destination is VERY slow (like an SMB drive). Putting that in concrete terms, let’s say you want to autosave every 1s, but the save destination is going to take 5s-10s to complete the save. Here is one way to handle that:

  • Your app copies from the destination to your local storage. This becomes your "working" copy that you modify.

  • Your app autosaves to local storage every ~1s.

  • Your app pushes that initial save data to the final target.

  • Every time the final save finishes, it starts a new save using the most recent save.

In other words, your app can rely on its "standard" set of 1s autosaves, but you're actually only saving to the final target every 5-10s (as the previous save finishes).

One final point here— if you're working with package and large file counts, directory cloning may provide a significant performance benefit. The man pages warn against cloning directories, but this forum post explains what the actual risks are and when it's a reasonable option.

I wonder, though— given that the original file has in fact been replaced by the temp file despite the error, can I not just check for this and ignore the error if the file seems to have been replaced after all? E.g.: ... Is there something wrong with this approach?

That is a REALLY tough call. The problem here is that your visibility into the exact cause of the error is limited, so while it's certainly safe in the particular case, it's hard to be sure that you're ACTUALLY dealing with this exact case. Even worse, my concern here would be the proliferation of edge cases, both in terms of what's out there "today" and in terms of future configurations/changes.

My own instincts would be to redo the entire save, but if you want to do this, I would do two things:

  1. Look at the NSError object you're getting back "in detail" so you can identify as "specific" a failure as possible. Notably, I think you can use NSUnderlyingErrorKey to pull an NSError object for the lower-level error, so I'd look at that object (and possibly any underlying NSError), not just catching the "fail".

  2. ...then an ID check to confirm you're "right" about the failure. I'd even check things like file size and possibly times so that you're "sure" everything looks the way you'd expect. Most of that metadata is collected with a single syscall, so it gets you a little extra safety without actually making things slower.

The goal is to fingerprint a particular failure you consider "safe", not just trusting the ID change. Having said that, I'd also be tempted to expand the check in #2 and run it against all files, not just the cases where you got an error. Done properly, there’s minimal performance cost, and there are worse things an app can do than double-checking its saves.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Thanks again for the reply, and especially for such a thorough and helpful one!

For example, I suspect this doesn't happen if you start with a security-scoped bookmark, which you resolve to a bookmark before each save.

Out of curiosity I just tested this, and I still see the bug. To see it yourself, just use the code from my first post but change the savingURL accessor to use a security-scoped bookmark, as follows:

private var bookmarkData: Data?
    
// Ask user to choose this via a save panel.
var savingURL: URL? {
    get {
        var isStale = false
        if let bookmarkData,
           let url = try? URL(resolvingBookmarkData: bookmarkData, options: .withSecurityScope, relativeTo: nil, bookmarkDataIsStale: &isStale) {
            if isStale {
                // Should really update the bookmark data here...
            }
            return url
        } else {
            return nil
        }
    }
    set(newURL) {
        bookmarkData = try? newURL?.bookmarkData(options: .withSecurityScope)
        setUpSpamSave()
    }
}

Then add the following to the top of spamSave() after checking savingURL is non-nil:

let didAccess = savingURL.startAccessingSecurityScopedResource()
if didAccess == false {
    print("Failed to start accessing scoped URL.")
}
 defer {
    if didAccess {
        savingURL.stopAccessingSecurityScopedResource()
    }
}

The save will fail with the same permissions error every now and then.

Just to clarify, are you:

a) Copying the file once, modifying it over time, then copying that file back for each save.

b) Copying the file prior to each save operation.

I suspect you're doing "a" (and it's probably what I would do), but if you're doing "b”, then that changes things a bit.

I’m actually doing (b), since this is very fast on copy-on-write volumes such as APFS even for large files. (Copy-to-temp file is almost instant; updating the zip file is super-fast too thanks to LibZip’s support for copy-on-write, meaning it doesn’t recreate the entire zip file; then it's just a matter of moving the updated file back into place using replaceItemAt(_:withItemAt:.)

For slower volumes, much like Pages.app, we offer a second, package-based version of our file format which supports in-place saving. (Zip-based is the default since it works everywhere and package-based files don’t work with cloud-based services other than iCloud Drive on iOS. But if saves get particularly slow, we prompt users to consider the package-based option.)

As I mentioned, I keep a snapshot of the zip file from the previous save around (for cases where the user has accidentally deleted the underlying file between saves), and I have done some testing and I can indeed re-save successfully using that. That is slow, though, since it has to recreate the entire archive. On APFS it’s faster to make a backup copy of the temp file before using replaceItemAt and then to try with the copy if it fails - that seems to work well.

Based on your suggestions, though, I’m going to do a bit of refactoring. Keeping the zip file around in the temp folder and making copies from it for replaceItemAt sounds like a great solution with multiple advantages, and since my custom file wrapper already keeps a reference to the previous snapshot, it wouldn’t be difficult to have it keep a reference to a temp file URL too.

Your app copies from the destination to your local storage…. Your app pushes that initial save data to the final target.

This is a little off-topic, but in this case - where you are doing the intensive work on your local storage and then pushing back to the slower volume when done - what is the safest way of replacing the original file? The point of FileManager.url(for: . itemReplacementDirectory…) is to return a temp folder on the same volume as the passed-in URL, since replaceItemAt(_:withItemAt:) won’t work if the original and new URLs are on different volumes. The only other way I can think of risks data loss:

  1. Delete the original file from the destination.
  2. Move the updated file from the local storage to the destination.

We could make a temp copy of the original file before (1), but if the volume is slow, that adds back in some of the slowness we’re avoiding by doing work on another volume.

My own instincts would be to redo the entire save, but if you want to do this, I would do two things:

I’m going to focus on retrying the save. I’m curious though as to whether the bug could occur twice in immediate succession, so that the resave also triggers the error. Although I can’t get this to happen in testing, given that the error seems random, I wonder if it is possible if I ran the test for long enough - a day, say. In that case, I wonder if this approach would work:

  1. Save - encounter some sort of save error.
  2. Retry the save no matter what the error was.
  3. If we get another error, examine the error and if it was caused by the bug, just try deleting the temp file and move on.

I’ve attached some code at the end of this post that scrutinises the error to check it matches the one triggered by this bug. (Although I wonder if I should use .fileContentIdentifier instead of .fileResourceIdentifier.)

Anyway, thanks again, as I’m very close to a solution now.

Error scrutiny:

struct FileInfo: Equatable {

    init?(url: URL) {
        guard
            let resourceVals = try?url.resourceValues(forKeys: [.fileResourceIdentifierKey, .fileSizeKey]),
            let fileID = resourceVals.fileResourceIdentifier,
            let fileSize = resourceVals.fileSize
        else {
            return nil
        }
        self.fileID = fileID
        self.fileSize = fileSize
    }
        
    private let fileID: (any NSCopying & NSSecureCoding & NSObjectProtocol)
    private let fileSize: Int
        
    static func == (lhs: ViewController.FileInfo, rhs: ViewController.FileInfo) -> Bool {
        return lhs.fileSize == rhs.fileSize && lhs.fileID.isEqual(rhs.fileID)
    }
}

func isSafeReplaceError(_ error: Error, fileURL: URL, tempURL: URL, oldFileInfo: FileInfo?, oldTempFileInfo: FileInfo?) -> Bool {
    // Using the file resource IDs and file size, ensure that the temp file and original file have been swapped.
    guard
        let oldFileInfo,
        let oldTempFileInfo,
        let fileInfo = FileInfo(url: fileURL),
        let tempFileInfo = FileInfo(url: tempURL),
        oldFileInfo == tempFileInfo,
        oldTempFileInfo == fileInfo,
        tempFileInfo != fileInfo
    else {
        return false
    }
        
    let nsError = error as NSError
        
    guard
        // Check this is a permissions error in the Cocoa error domain.
        nsError.domain == NSCocoaErrorDomain,
        nsError.code == NSFileWriteNoPermissionError,
        // Check "NSURL" and "NSFileNewItemLocationKey" keys both point to the file we tried to replace.
        let errorURL = nsError.userInfo[NSURLErrorKey] as? URL,
        let newItemURL = nsError.userInfo["NSFileNewItemLocationKey"] as? URL,
        errorURL.path(percentEncoded: false) == newItemURL.path(percentEncoded: false),
        newItemURL.path(percentEncoded: false) == fileURL.path(percentEncoded: false),
        // Check "NSFileOriginalItemLocationKey" and "NSFileBackupItemLeftBehindLocationKey" both point to the temp file.
        let originalURL = nsError.userInfo["NSFileOriginalItemLocationKey"] as? URL,
        let leftBehindURL = nsError.userInfo["NSFileBackupItemLeftBehindLocationKey"] as? URL,
        originalURL.path(percentEncoded: false) == leftBehindURL.path(percentEncoded: false),
        originalURL.path(percentEncoded: false) == tempURL.path(percentEncoded: false),
        // Ensure there is only a single underlying error.
        nsError.underlyingErrors.count == 1
    else {
        return false
    }
        
    // Now get the underlying error.
    let underlyingError = nsError.underlyingErrors[0] as NSError
    guard
        // Check the underlying error is also a permissions error in the Cocoa domain.
        underlyingError.domain == NSCocoaErrorDomain,
        underlyingError.code == NSFileWriteNoPermissionError,
        // And ensure the the error is with the temp file.
        let underlyingErrorURL = underlyingError.userInfo[NSURLErrorKey] as? URL,
        underlyingErrorURL.path(percentEncoded: false) == tempURL.path(percentEncoded: false),
        // Ensure the underlying error also has a single underlying error.
        underlyingError.underlyingErrors.count == 1
    else {
        return false
    }
        
    // Now get the underlying error for the underlying error. This should be a POSIX error with error code 1 ("Operation not permitted").
    let rootError = underlyingError.underlyingErrors[0] as NSError
    return rootError.domain == NSPOSIXErrorDomain && rootError.code == 1
}

Out of curiosity, I just tested this, and I still see the bug. To see it yourself, just use the code from my first post but change the savingURL accessor to use a security-scoped bookmark, as follows:

To be honest, that was basically a blind (well, slightly educated...) guess. To be honest, the whole combination of factors is fairly odd (timing is random, failure self-corrects, etc.).

One thing to pass along— I just did a bit of testing with retrying the copy, and "clearing" the error seems to be tied to TIME, not retry count. If you decide to go the retry route, you may want to delay the save for a second or so instead of just retrying over and over.

I’m actually doing (b), since this is very fast on copy-on-write volumes such as APFS even for large files. (Copy-to-temp file is almost instant.)

I think "very fast" actually understates how significant the performance difference is. As an "industry", I'm not sure we've really processed how constant-time copying should change file management.

For slower volumes, much like Pages.app, we offer a second, package-based version of our file format which supports in-place saving.

I don't know if anyone has ever shipped a solution that worked like this, but given the performance benefit, it might be worth thinking about using DiskImages as a "file format". You can mount the disk image outside of the user’s "view", then use it as your working storage. That won't work for all cases, but it could be useful in some situations.

This is a little off-topic, but in this case— where you are doing the intensive work on your local storage and then pushing back to the slower volume when done— what is the safest way of replacing the original file?

The "replaceItem(at:...)" documentation actually answers this, which is to copy the item to the destination volume, then use "replaceItem(at:...)" to finish the transfer. The reason "backupItemName" exists is that if anything occurs that prevents replace from completing, then backupItemName contains the original file. This is why we cover all the file systems where atomic file replacement doesn't exist.

Also note this is in the same reference:

"If an error occurs and the original item is not in the original location or a temporary location, the resulting error object contains a user info dictionary with the key "NSFileOriginalItemLocationKey". The value assigned to that key is an NSURL object with the location of the item."

I’m going to focus on retrying the save. I’m curious though as to whether the bug could occur twice in immediate succession, so that the resave also triggers the error.

Yes, it will, at least in my testing. More specifically, I modified your test project to this:

var finishedSave = false
var failCount = 0
while(!finishedSave) {
	do {
		// Use replaceItemAt to safely move the new file into place.
		_ = try fileManager.replaceItemAt(savingURL, withItemAt: tempURL)
		try? fileManager.removeItem(at: replacementDirURL) // Clean up.

		finishedSave = true
	} catch {
		failCount += 1
		if(failCount == 1){
			NSLog("First Fail on \(count)")
		}
	   sleep(1)
	}
}
if(failCount > 0) {
	NSLog("\(count) cleared after \(failCount) retries")
}

And... and it took from 1 to 10 retries for the replace to succeed. Note that the issue does seem to be tied to time, not try count. I tried this first without the sleep and all that changed was that I generated a lot more calls to "replaceItemAt". Now, I don't know how this would translate to a more "real" save logic.

(Although I wonder if I should use .fileContentIdentifier instead of .fileResourceIdentifier.)

Mostly, you'll want .fileResourceIdentifier. fileContentIdentifier is an APFS specific[1] identifier that allows you to identify related clones. I think it actually implies identical contents, so two files with the same fileContentIdentifier have the same physical content, not just relationship, but either way it's not really useful for what you're doing. fileResourceIdentifier is what you want, as it's basically "the inode number plus other data to deal with all the edge cases".

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Thanks again!

I think "very fast" actually understates how significant the performance difference is.

Ha, true. In practice it seems “instant”, to the extent that on APFS, updating huge zip files is not much slower than in-place saving into a package.

I don't know if anyone has ever shipped a solution that worked like this, but... it might be worth thinking about using DiskImages as a "file format".

Interesting! Although cross-platform compatibility might be an issue here.

The "replaceItem(at:...)" documentation actually answers this…

Sorry, I should have been more clear, although thinking about it I have been tying myself up in knots and the solution was indeed here all along. I was referring to the circumstances we were discussing before, where we don’t want to do the temp work on the same volume as the destination because the destination volume is slow.

In other words, we have deliberately created the temp folder for updating our file on another volume (e.g. one that supports APFS), because the one created using url(for: .itemReplacementDirectory…) would be too slow, and now we need to move that temp file into place on the other volume.

From your answer I realise I was overlooking the obvious: after doing the work in the fast temp directory, I then need to create a second temp directory on the slower destination volume using url(for: .itemReplacementDirectory…), copy the file across, and then use replaceItemAt from there.

Yes, it will, at least in my testing. More specifically, I modified your test project to this:

while(!finishedSave) {
    _ = try fileManager.replaceItemAt(savingURL, withItemAt: tempURL)

This approach wouldn’t work anyway. The nature of this specific error means that you cannot retry replaceItemAt on the same URLs like this, because after the error, savingURL and tempURL have swapped places. So in your sample code, if the second replaceItemAt succeeds, you’ve just replaced the newer version with the older version again, so that the save has effectively done nothing. We’ll only get the result we want when failCount % 2 == 0.

You can test this by logging the expected and actual final content of the file (i.e. log the content of tempURL before the loop, and the content of savingURL after it). Whenever failCount % 2 == 1, you’ll end up with old content at the destination, because of the alternate swapping of the original and new files.

The other problem with retrying replaceItemAt on the same URLs is that, as you note, tempURL (which after the initial replaceItemAt error contains the older file that was previously in the ubiquitous storage) still has the lock (?) on it which caused the permissions error. So any attempts to use that will continue to fail until the kernel (?) has finished with it.

For these reasons, we were previously talking about making a fresh copy of the updated temp file before trying replace, and calling replaceItemAt on that, so that we keep around a valid copy of the new file with which we can try again. (E.g. Have a working copy in the temp dir, update that, clone it, try replace using the clone, if that fails, try again with a fresh clone of the working copy.)

To update your code using this sort of approach:

var tempCopyURL = tempURL.deletingLastPathComponent().appending(path: UUID().uuidString)
var finishedSave = false
var failCount = 0
while (!finishedSave) {
    do {
        // Create a clone of our new file for replace.
        try fileManager.copyItem(at: tempURL, to: tempCopyURL)
        // Try to replace using the clone.
        _ = try fileManager.replaceItemAt(savingURL, withItemAt: tempCopyURL)
        try? fileManager.removeItem(at: replacementDirURL) // Clean up.
        finishedSave = true
    } catch {
        failCount += 1
        if(failCount == 1) {
            NSLog("First Fail on \(count-1)")
        }
        // Try again on the next pass with a fresh clone.
        tempCopyURL = tempURL.deletingLastPathComponent().appending(path: UUID().uuidString)
    }
}
if(failCount > 0) {
    NSLog("\(count-1) cleared after \(failCount) retries")
}

For me, this succeeds on the first retry every time, because we’re working with a fresh temp file, not the one that we’re denied access to. Out of 50,000 saves, I hit the error 150 times and each time it resolved on first retry. (It also ensures we end up with the correct version of the file being moved into place.) The disadvantage of course is that you’re adding in an extra copy of the temp file, which adds overhead on non-APFS/copy-on-write volumes.

To return to my original question:

I’m curious though as to whether the bug could occur twice in immediate succession, so that the resave also triggers the error.

Here I was wondering whether we could, on rare occasions, encounter the error twice in immediate succession even with the approach of using a fresh clone of the temp file for each attempt. My suspicion is that this shouldn’t happen, because here’s my wild (and completely uneducated!) guess as to what is happening:

  1. Given that this weird error only happens for ubiquitous files, I’m guessing that the problem occurs when the kernel is intermittently doing something cloud-related with the original file, putting some sort of lock on it that prevents us from deleting it - but not from moving it for some reason.
  2. replaceItemAt successfully swaps out the original ubiquitous file for the replacement, but the kernel still has a lock on the original file (which is now in the temp folder) and so won’t allow it to be deleted, so replaceItemAt throws an error.
  3. So if at this point we immediately retry replaceItemAt with a fresh clone, all should be good because the kernel shouldn’t be doing anything yet with the file that was, in the same run loop, just swapped into the destination URL. (At this point in fact the file at the destination URL and the fresh clone we’re replacing it with are identical.)

Does that sound reasonable?

Mostly, you'll want .fileResourceIdentifier. fileContentIdentifier is an APFS specific[1] identifier

Thank you. I realised my mistake on this late yesterday while testing.

So, given all of the above, I think my approach should be:

  1. Make a working copy in a temp dir (if destination doesn’t support cloning but local storage does, make the working copy on the local storage): workingCopyURL.
  2. On save, update the working copy.
  3. Copy the working copy to a folder created using url(for: .itemReplacementDirectory…): tempURL.
  4. Use replaceItemAt, replacing destinationURL with tempURL.
  5. If replaceItemAt fails, AND isUbiquitous is true for destinationURL, create a fresh copy of the working copy, and try replaceItemAt again with that. (If the file wasn’t ubiquitous, just throw the error.)
  6. If replaceItemAt fails the second time, examine the error to check for this very specific bug, and if it all checks out, move on.

From your answer, I realise I was overlooking the obvious: after doing the work in the fast temp directory, I then need to create a second temp directory on the slower destination volume using URL(for: .itemReplacementDirectory…), copy the file across, and then use replaceItemAt from there.

Sure, that makes sense. I think this is also one of those cases where the broader "context" of your app and user base is really a really important factor. Some apps (smaller documents, more "consumer" focused) should do their best to hide all of these details from the user, as they're not really useful/relevant and they just end up creating "noise". Other apps (larger documents, more "expert" focused) benefit from making this visible to the user, as it lets them understand what's going on both to correct any problems (for example, space "loss" due to save failures) and take advantage of the data you "have" for them (for example, failure recovery).

So any attempts to use that will continue to fail until the kernel (?) has finished with it.

Yes, this was actually what I was trying to explore— how "sticky" failure was.

One small note here:

The disadvantage, of course, is that you’re adding in an extra copy of the temp file, which adds overhead on non-APFS/copy-on-write volumes.

The one (slightly odd) edge case where you might encounter this EXACT failure and which would probably be worth independently testing is when you're accessing the contents of an iCloud Drive over SMB. The interesting point here is that this actually IS a case where you could still take advantage of cloning and also get a nice speedup on non-APFS volumes. It probably isn't worth the trouble unless you're dealing with very large files, but see this forum post for the slightly complicated details.

So, given all of the above, I think my approach should be:

That all looks reasonable. However, the one detail I'd be careful about is where you put that temp directory. The default system configuration puts home directories in the data volume, but users can move their home directory “anywhere," and one configuration (which I happen to use) has the home directory on a third volume. This means that using system tmp can end up forcing a cross-volume copy that isn't really necessary. Similarly, for REALLY high-end installations (think big NAS/SAN setups), the volume configuration presented to the system doesn't necessarily match the physical configuration, which means directories on the same logical volume can have WILDLY different performance characteristics.

This is another case where the large app context matters— for expert/professional apps dealing with very large files, the best option is often to simply provide some kind of configuration/override so that the user can just tell your app what to do.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Great, thanks again.

the broader "context" of your app and user base is really a really important factor

It's a writing app that is a simpler version of our flagship app, and we want it as user-friendly as possible. It's therefore a bit of a balancing act in this regard (users could end up with large files because they can import research, but in general we want to hide this sort of stuff from the user as much as possible).

Anyway, I think I'm mostly there now. I have it making a clone of the temp file on systems that support cloning (.volumeSupportsFileCloning), attempting a re-save with that, or falling back on a check of the error message and whether the temp and original files have swapped places otherwise. It all seems to be working well so far.

However, the one detail I'd be careful about is where you put that temp directory.

What's the best way of being careful about this, or do you just mean by using the item replacement directory where possible? As far as I know, there are only two ways of getting a temp directory:

  1. FileManager.url(for: .itemReplacementDirectory...) - ensures the temp folder is on the same volume as the passed-in URL.
  2. FileManager.temporaryDirectory or URL.temporaryDirectory - places the temp folder in the data volume? Or home directory? (Under sandboxing at least it seems to be in the home directory.)

My current solution only uses temporaryDirectory if it supports cloning and the item replacement directory doesn't, otherwise it uses the item replacement directory to be sure that the work is done on the same volume as the file that is being replaced. (LibZip is much faster on a volume that supports cloning, and in most cases re-zipping a large file without cloning is slower than copying the file between volumes for the zip operation.)

What's the best way of being careful about this, or do you just mean by using the item replacement directory where possible? As far as I know, there are only two ways of getting a temp directory:

Everything you've described sounds like you're on the right track. The big thing is just not making assumption about the relationship between directories ("/tmp/" and "home" are on the same volume) or capabilities ("home directories are ALWAYS on volumes that support cloning"). The killer here is the long tail, as there are just SO many different edge cases.

LibZip is much faster on a volume that supports cloning

Interesting. Are you primarily "editing" the contents of the zip file (so you end up modifying the data inside, but don't really change it's overall size or structure)? Cloning is a huge help if you can clone the contents and then modify but if your modifications end up changing the fundamental contents, then I wouldn't expect the difference to be nearly as large. At large scale, this eventually devolves to "bytes moved".

One thing to be careful of is metric distortion external to the file system itself. For example, the "average" APFS volume is much faster than the "average" HFS+ volume. However, the primary reason for that isn't the volume format itself, it's the APFS has been our boot format long enough that a HUGE percentage of APFS volumes are our internal SSDs (which happen to be pretty fast).

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Everything you've described sounds like you're on the right track.

Great, thanks!

Interesting. Are you primarily "editing" the contents of the zip file (so you end up modifying the data inside, but don't really change it's overall size or structure)? Cloning is a huge help if you can clone the contents and then modify but if your modifications end up changing the fundamental contents, then I wouldn't expect the difference to be nearly as large. At large scale, this eventually devolves to "bytes moved".

Yes, I believe editing a large zip file using LibZip can indeed still be slow on APFS. However, the nice thing is that if a user edits a text file in a (zip) project in our app, only the first save to those edits would have the potential to be slow. After that, until they switched to editing another text file in the project, saving subsequent edits even into a huge zip file would be fast on APFS. (A project created in our app can contain text but also research files such as PDFs, media and images.)

This is because, on systems that support cloning, LibZip only rewrites the zip file starting with the first changed entry. And whenever I write changes to the zip file, my code deletes the old entry and then re-adds it with the new data, so that the edited text file becomes the last entry.

So, say you have a 5KB text file inside a 500MB zip file and it's the first entry. If you edit that text file, in theory the next save will rewrite the entire 500MB. But from then on, because the new data for that text file is now at the end of the zip file's entries, saving changes to it will cause only 5KB (or however large the text file is after edits) to be rewritten. (I say “in theory” because LibZip seems to be doing something smarter somehow; even if you overwrite the text file at the same position—at the first entry—saves are still a lot faster than they would be writing the entire 500MB out again.)

And because only text files are editable in my app, they will drift towards the bottom of the zip file's entries as they are edited.

Anyway, thanks again for all the help and getting me back on track!

FileManager.replaceItemAt(_:withItemAt:) fails sporadically on ubiquitous items
 
 
Q