NSFileManager getRelationship:ofDirectoryAtURL:toItemAtURL:error: returning NSURLRelationshipSame for Different Directories

I'll try to ask a question that makes sense this time :) . I'm using the following method on NSFileManager:

  • (BOOL) getRelationship:(NSURLRelationship *) outRelationship ofDirectoryAtURL:(NSURL *) directoryURL toItemAtURL:(NSURL *) otherURL error:(NSError * *) error;
  • Sets 'outRelationship' to NSURLRelationshipContains if the directory at 'directoryURL' directly or indirectly contains the item at 'otherURL', meaning 'directoryURL' is found while enumerating parent URLs starting from 'otherURL'. Sets 'outRelationship' to NSURLRelationshipSame if 'directoryURL' and 'otherURL' locate the same item, meaning they have the same NSURLFileResourceIdentifierKey value. If 'directoryURL' is not a directory, or does not contain 'otherURL' and they do not locate the same file, then sets 'outRelationship' to NSURLRelationshipOther. If an error occurs, returns NO and sets 'error'.

So this method falsely returns NSURLRelationshipSame for different directories. One is empty, one is not. Really weird behavior. Two file path urls pointing to two different file paths have the same NSURLFileResourceIdentifierKey? Could it be related to https://developer.apple.com/forums/thread/813641 ?

One url in the check lived at the same file path as the other url at one time (but no longer does). No symlinks or anything going on. Just plain directory urls.

And YES calling -removeCachedResourceValueForKey: with NSURLFileResourceIdentifierKey causes proper result of NSURLRelationshipOther to be returned. And I'm doing the check on a background queue.

Answered by DTS Engineer in 878053022

Doesn't appear to be what's going on in this case. I made this dumb little test which can easily reproduce the issue (sorry, can't get code to format well on these forums).

Interesting. So, I can actually explain what's going on, and it's actually not the cache.

So, architecturally, NSURL has two different mechanisms for tracking file location— "path" and "file reference". Path works exactly the way you'd expect (it's a string-based path to a fixed location), while file reference relies on low-level file system metadata to track files. Critically, this means that the file reference will track the object as it's moved/modified within a volume.

Secondly, keep in mind NSURLs are generally "data" objects, meaning they don't "proactively" update their content.

So, the actual issue here starts here:

if (![fm trashItemAtURL:untitledFour resultingItemURL:&resultingURL error:nil])

At the point that method returns, "untitledFour" is no longer entirely coherent, as its path points to the original location, but its reference points to the file in the trash. You can see this for yourself by running this at the top of compareBothURLS:

NSURL* pathURL = untitledFour.filePathURL;
NSURL* refURL = untitledFour.fileReferenceURL;

NSLog(@"1 %@", untitledFour.path);
NSLog(@"2 %@", pathURL.path);
NSLog(@"3 %@", refURL.path);
	
NSLog(@"A %@", untitledFour.fileReferenceURL.description);
NSLog(@"B %@", pathURL.fileReferenceURL.description);
NSLog(@"D %@", refURL.fileReferenceURL.description);

What you'll find is that:

  • In the first log set, "1" & "2" will match, both pointing to the original file location. "3" will not, pointing to the trash instead.

  • In the second log set, "A" & "C" will match, while "B" will not.

More specifically, the strings returned in the second log set will have this format:

file:///.file/id=<number>.<number>/

...and the second number will be different for "B".

With all that context:

(1) The reason getRelationship is returning "same" is that it primarily relies on file reference data and the reference data points to the file in the trash. There's an argument that it shouldn't do this, however. In its defense, using the reference data makes it much easier to sort out issues like hard-linked files and/or symbolic links allowing multiple references to the same file.

(2) The reason "removeCachedResourceValueForKey" changed the behavior is that it deleted the file reference data, forcing NSURL to resolve the data again. You'll actually get exactly the same effect if you test with "untitledFour.filePathURL".

What I'd highlight here is that the "right" behavior here isn't entirely clear. That is, is the problem that "getRelationship" is claiming that two different paths are "the same file"? Or is the problem that NSURL is returning the wrong path value for a specific file?

That question doesn't have a direct answer because the system doesn't really "know" what you actually want— are you trying to track a particular "object" (fileReferenceURL) or are you trying to reference a particular "path" (filePathURL)? It doesn't "know", so it's ended up with an slightly different object that's tracking both...

...but you can tell it what you want, at which point the API will now do exactly what you'd expect. More specifically, you can change the behavior by forcing the URL type you want immediately after you create the directory:

    if (![fm createDirectoryAtURL:untitledFour withIntermediateDirectories:YES attributes:nil error:nil])
    {
        NSLog(@"Test failed");
        return;
    }
    
#if 1
    untitledFour = untitledFour.fileReferenceURL;
#else
    untitledFour = untitledFour.filePathURL;
#endif

Strictly speaking, you could set "filePathURL" anywhere you want, but you can't create a fileReferenceURL to a non-existent object, so it needs to be after the create. In any case, either of those two configurations works the way you'd expect.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

So this method falsely returns NSURLRelationshipSame for different directories. One is empty, one is not. Really weird behavior.

Do you know where/what the directories "were"? The problem here is that there's a pretty wide variation between the "basic" case of "a bunch of files and directories sitting on a standard volume" and "the range of ALL possible edge cases".

Two file path URLs pointing to two different file paths have the same NSURLFileResourceIdentifierKey?

Yes, this is possible. As one example, the data volume basically ends up in the hierarchy "twice" meaning that, for example, the path "/System/Volumes/Data/Users/" and "/Users/" are in fact the same directory. And, yes, getRelationship returns NSURLRelationshipSame for those directories.

Now, this:

One is empty, one is not.

...is definitely "weirder". Ignoring the cache issue below, I don't think you could do it within a standard volume, but you might be able to do it using multiple volumes, particularly duplicated disk image and/or network file systems.

However, in this case:

Could it be related to https://developer.apple.com/forums/thread/813641?

One URL in the check lived at the same file path as the other URL at one time (but no longer does). No symlinks or anything going on. Just plain directory URLs.

...yes, it's a/the cache. The proof of that is this:

And YES calling -removeCachedResourceValueForKey: with NSURLFileResourceIdentifierKey causes the proper result of NSURLRelationshipOther to be returned. And I'm doing the check on a background queue.

...since any issue that is fixed by clearing the cache is, by definition, "caused" by the cache. That's a good excuse to revisit this thread here, which I'm afraid I missed:

Could it be related to https://developer.apple.com/forums/thread/813641 ?

The core of the issue here is the inherent tension between a few facts:

  1. The entire file system is essentially a lock-free database being simultaneously modified by an unconstrained number of processes/threads.

  2. Your ability to monitor file system state is relatively limited. Basically, you can either ask for the current state and receive an answer with unknown latency or ask the system to update you as things change, at which point you'll receive a stream of events... with unknown latency.

  3. Accessing the file system is sufficiently slow that it's worth avoiding/minimizing that access.

Jumping back to here, there's actually a VERY straightforward way to do this:

Two file path URLs pointing to two different file paths have the same NSURLFileResourceIdentifierKey?

That is, have two processes where:

Process 1 calls "getRelationship".

Process 2 manipulates the file system such that the following sequence occurs:

  1. Process 1 retrieves the metadata of the source object.
  2. Process 2 deletes the existing directory at the target location.
  3. Process 2 moves the source object to the target location.
  4. Process 2 deletes the contents of the target object.
  5. Process 1 retrieves the metadata of the target object.

...and process 1 now compares #1 and #5, returning NSURLRelationshipSame because they are in fact the same. Now, you might say this seems far-fetched/impossible to time; however, I never said process 2 was running on the same system. With SMB over a slow connection, I suspect you could replicate the scenario above pretty easily.

The point here is that the system’s caching behavior is simply one dynamic among many. That is, caching increases the probability of strange behavior (like the one above) because it increases the time gap between #1 and #5, and the wider the gap between actions, the more likely it is that "something" has changed. However, you can't actually shrink the gap to the point where it goes away.

One solution to these issues is for the interested processes to communicate with each other to coordinate their actions (for example, by using "File Coordination“). However, that requires all of the processes involved to participate in that mechanism, which they definitely don't today.

Realistically, the reason this all isn't a total disaster is that most of the activity here is either:

  • Directly controlled/managed by the user, who is both being careful about what they does and moving "slow" enough that collisions don't happen.

OR

  • Happening in "private" parts of the file system where only one "entity" is manipulating the data (for example, an app’s data container).

All of which leads to the big question... what are you actually trying to do?

If this is a one-off event that you're concerned/confused about, then the answer is basically, yes, the file system can be way weirder than it looks, and sometimes that means calling removeCachedResourceValueForKey "just in case".

However, if this is something that is a recurring problem for your app, then it might be worth stepping back and rethinking your approach to minimize the possibility and consequences of these kinds of "oddities".

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Thanks for the reply! I actually stumbled across this while reworking things in my app to account for NSURL caching behavior I mentioned in the other thread.

What I was doing not too long ago was using an NSCache on top of NSURL for resource values. At some point when responding to metadata changes I was calling -removeCacheResourceValues on a background thread to get refreshed data and I had discovered that -removeCacheResources could crash if another thread was reading at the same time. I guess at some point in my frustration I just moved some stuff around to stop the crashes (and I did).

I had either forgotten or just never realized that NSURL caches only for a run loop turn (or maybe just sometimes? More on that in a second). I guess this is cool in the middle of a dragging session but apparently at some point I must've just assumed that NSURL must be caching for a more meaningful period of time (from the perspective of my app anyway) because if I didn't call -removeCachedResources I'd get stale values sometimes. SO why cache on top of a cache? And I chucked my NSCache which I never really loved but apparently that was a mistake. My bad.

I guess my wish would be for NSURL to either cache forever until explicitly clear values or don't cache at all because if we're caching on a cache that may not be a cache but sometimes it seems like a cache it's hard to cache. Maybe I'm just being selfish though.

But back to the collision. So I'm reworking all this (not using NSCache this time). Now as I'm rewriting my caching code I commented out few things here and there checking some error handling code paths that seem extremely unlikely to really occur and I stumble across this collision but there are many run loop turns in between these events so I don't understanding why the cached values are living for so long in this particular case. Maybe something like cancelPreviousPerformRequestsWithTarget causes cached values to live longer but I'm not suppose to worry about the implementation details.

I can easily reproduce this with NSFileManager using the following steps:

  1. -trashItemURL:resultingItemURL: - grab the resultingItemURL.
  2. Put an empty new folder in the exact same location you just trashed.
  3. Compare the NSURLFileResourceIdentifierKey of the URLs you got from resultingItemURL with new folder at its old location and they match - until you programmatically remove the cached value.

I guess my wish would be for NSURL to either cache forever until explicitly cleared values or not cache at all because if we're caching on a cache that may not be a cache but sometimes it seems like a cache it's hard to cache.

So, the first issue here is that "not caching at all" isn't really an option. Most of the data you retrieve from NSURL all came from the same API (getattrlist) and, much of the time, that data is ALWAYS retrieved in every call. getattrlist() is a "bulk" fetch API (it's designed to return a bunch of data at once) and the vast majority of the performance cost here is the cost of the syscall itself, NOT the retrieval of the data itself or the copy out of the kernel. Putting that in concrete terms, let’s say you ask for "all" of the times for a file (ATTR_CMN_CRTIME, ATTR_CMN_MODTIME, ATTR_CMN_CHGTIME, ATTR_CMN_ACCTIME, ATTR_CMN_BKUPTIME):

  • Basically "every" file system is going to end up storing all of those values inside some kind of file system-specific structure, so the only "cost" here is the act of finding that record, not the individual time.

  • All the values involved are so small that there isn't that the transit cost "out" of the kernel is basically fixed.

...so asking for one of them costs exactly the same as asking for all 5.

Putting that another way, there's a fundamental disconnect between how file system calls work and how NSURL works. File system APIs are built as "retrieval APIs" which return as much data as possible in a single call (stat being an obvious example). All of the data returned by each system call represents the exact state of that object at a particular "instant" in time. It may not be right "now" (the file system can be constantly changing) but it WAS right at some moment in time.

On the other hand, NSURL (and lots of other API layers) want to let you retrieve individual elements separately, but that means the API then needs to decide whether to:

  1. Return the data it retrieved in an earlier call, which is both faster and provides a more "coherent" picture of the file system state, since the data being retrieved is coming from the same "fetch".

  2. Fetch new data, which is more accurate but creates inconsistent results between the "current" state and the "previous" state.

ACTUALLY doing #2 for every call is a terrible idea for both performance and coherence issues, but that means we're basically stuck trying to sort out when to reset, not if we're going to cache.

As a side note here, an API like URL.resourceValues(forKeys:) gets you much closer to how the file system itself works, since you're not retrieving a fixed dictionary from a particular instant, NOT an ambiguous data smear.

I can easily reproduce this with NSFileManager using the following steps:

  1. -trashItemURL:resultingItemURL: - grab the resultingItemURL.
  2. Put an empty new folder in the exact same location you just trashed.
  3. Compare the NSURLFileResourceIdentifierKey of the URLs you got from resultingItemURL with the new folder at its old location and they match - until you programmatically remove the cached value.

Huh. That's really weird. How did you construct those URLs? Are you building them from string paths or getting them from the system (like through an open panel or by enumerating the directory)? What does "isFileReferenceURL” mean and what happens if you do the same check but call "fileReferenceURL" on both URLs first?

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

ACTUALLY doing #2 for every call is a terrible idea for both performance and coherence issues, but that means we're basically stuck trying to sort out when to reset, not if we're going to cache.

I agree. IMO the problem is not that NSURL is caching the problem is the way it caches. The way it caches forces me to cache on top of it. The documentation claims it only caches for 1 run loop turn but as previously mentioned that is not always the case and certain values tend to get 'stuck.'

Building a cache on top of NSURL resource values which may or may not be stale can cause all sorts of weird behavior if you don't call -removeCacheResourceValues so I can cache on the true value...but NSURL I assume makes its cache thread safe so -removeCacheResourceValues probably isn't so cheap. Doesn't that mean the url cache is costing me performance by requiring me to clear it to get to the true values I really want to cache?

Huh. That's really weird. How did you construct those URLs?

Originally the URL came through NSFIlemanager enumeration, or maybe -createDirectoryAtURL: I can't remember. I'll have to try it out later when I have a little bit more time.

But I just stumbled across some really weird behavior when passing a file type from Finder to my app. It could be unrelated but I wouldn't be completely surprised if it was related to this topic. I might file a bug on that later. It would be great if this forum supported private messages I'm not sure if I'm ready to provide more details yet in the open

The documentation claims it only caches for 1 run loop turn, but as previously mentioned, that is not always the case, and certain values tend to get 'stuck.'

FYI, I think there are actually two different issues at work here:

  1. The run loop itself doesn't actually "turn" at a predictable rate. Depending on how your app is architected and the overall app state, it's entirely possible for an app to go seconds or even minutes without the main thread ever running.

  2. The documentation says that values are "automatically removed after each pass through the run loop", but that's not quite accurate. NSURL is tracking the main loop activity through a runloop observer, but it doesn't actually flush the cache until the first time "something" tries to access that URL from the main thread. If nothing on the main thread accesses that URL, then it could theoretically return the old values "forever".

...with #2 obviously being the most significant issue.

Building a cache on top of NSURL resource values, which may or may not be stale, can cause all sorts of weird behavior if you don't call -removeCacheResourceValues so I can cache on the true value...but NSURL, I assume, makes its cache thread-safe, so -removeCacheResourceValues probably isn't so cheap. Doesn't that mean the URL cache is costing me performance by requiring me to clear it to get to the true values I really want to cache?

Hypothetically, yes, but if you ACTUALLY run into performance, then I think you have a bigger issue. In terms of the lock itself, there's an os_unfair_lock that's used to protect access to the data, which means the cost of uncontested access is fairly minimal. The problem here is that having contention means that you have multiple threads attempting to manage/manipulate the same file at the same time... which is a bad idea regardless of performance.

That leads back to here:

Building a cache on top of NSURL resource values

The real question here is basically "what are you trying to do"? The problem here is that NSURL is basically a low-level primitive, not really "the" solution for file tracking. For example:

  1. Document-based apps are better off using a class like NSDocument, which manages things like file coordination and safe saves.

  2. Longer-term file tracking is better done with bookmarks, since they're harder to break and allow an app to restore access to the target as needed.

  3. Apps that manipulate files "in bulk" often end up using lower-level APIs to improve performance.

One final note here is that it's not difficult to get an NSURL object that doesn't have the automatic flushing behavior. All you need to do is take the NSURL you're starting with, pass it (or whatever API fits what you're starting with) into CFURLCreateFilePathURL() (to create a CFURLRef), then cast that CFURLRef back to NSURL. Toll-free bridging means that CFURLRef can be used exactly like an NSURL, so the only difference is that it won't free its own cache.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

The run loop itself doesn't actually "turn" at a predictable rate. Depending on how your app is architected and the overall app state, it's entirely possible for an app to go seconds or even minutes without the main thread ever running.

Doesn't appear to be what's going on in this case. I made this dumb little test which can easily reproduce the issue (sorry can't get code to format well on these forums).

+(MachoManURLTester*)sharedTester
{
	static MachoManURLTester *sharedTester = nil;
	
	static dispatch_once_t token;
	dispatch_once(&token,^{
		sharedTester = [[self alloc]init];
	});
	return sharedTester;
}

-(void)startURLTrashDance
{
	NSAssert(NSThread.currentThread.isMainThread, @"Main thread only.");
	
	NSFileManager *fm = [NSFileManager defaultManager];
	NSURL *wrapperDir = [[NSURL fileURLWithPath:NSTemporaryDirectory() isDirectory:YES] URLByAppendingPathComponent:NSUUID.UUID.UUIDString isDirectory:YES];
	if (![fm createDirectoryAtURL:wrapperDir withIntermediateDirectories:YES attributes:nil error:nil])
		{
			NSLog(@"Test failed");
			return;
		}
	
	//[[NSWorkspace sharedWorkspace] activateFileViewerSelectingURLs:@[wrapperDir]];
	
	NSURL *untitledFour = [wrapperDir URLByAppendingPathComponent:@"Untitled 4" isDirectory:YES];
	if (![fm createDirectoryAtURL:untitledFour withIntermediateDirectories:YES attributes:nil error:nil])
	{
		NSLog(@"Test failed");
		return;
	}
	
	NSLog(@"Created untitled 4.");
	
	NSURL *resultingURL = nil;
	
	if (![fm trashItemAtURL:untitledFour resultingItemURL:&resultingURL error:nil])
		{
			NSLog(@"trash failed");
			return;
		}	
	
	NSLog(@"Moved Untitled 4 to the trash.");
	
	[self performSelector:@selector(replaceTrashedURL:) withObject:untitledFour afterDelay:1.0];
	[self performSelector:@selector(compareBothURLS:) withObject:@[untitledFour,resultingURL] afterDelay:4.0];
	
}


-(void)replaceTrashedURL:(NSURL*)originalURL
{
	NSFileManager *fm = [NSFileManager defaultManager];
	if ([fm createDirectoryAtURL:originalURL withIntermediateDirectories:YES attributes:nil error:nil])
	{
		NSLog(@"Recreated Untitled 4");
	}
}

-(void)compareBothURLS:(NSArray<NSURL*>*)twoURLsArray
{
	NSLog(@"4 seconds is up - let's check");
	NSFileManager *fm = [NSFileManager defaultManager];
	NSURL *untitledFour = twoURLsArray.firstObject;
	NSURL *resultingURL = twoURLsArray.lastObject;
	
	// Uncomment these fixes the relationship check:
	//[untitledFour removeCachedResourceValueForKey:NSURLFileResourceIdentifierKey];
	//[resultingURL removeCachedResourceValueForKey:NSURLFileResourceIdentifierKey];
	
	NSURLRelationship relationship;
	NSError *error = nil;
	if ([fm getRelationship:&relationship ofDirectoryAtURL:untitledFour toItemAtURL:resultingURL error:&error])
		{
			if (relationship == NSURLRelationshipSame)
				{
					NSLog(@"NSURLRelationshipSame: %@ - %@?",untitledFour,resultingURL);
				}
			else if (relationship == NSURLRelationshipContains)
				{
					NSLog(@"NSURLRelationshipContains");
				}
			else  if (relationship == NSURLRelationshipOther)
				{
					NSLog(@"NSURLRelationshipOther");
				}
			else {
				NSLog(@"Unknown");
			}
		}			
	else 
		{
			NSLog(@"Error reading relationship: %@",error);
		}
}

@end

Just use that class and do this in a test program.

	MachoManURLTester *URLTester = [MachoManURLTester sharedTester];
	[URLTester startURLTrashDance];

And to answer your earlier question, YES the file reference urls do collide.

Accepted Answer

Doesn't appear to be what's going on in this case. I made this dumb little test which can easily reproduce the issue (sorry, can't get code to format well on these forums).

Interesting. So, I can actually explain what's going on, and it's actually not the cache.

So, architecturally, NSURL has two different mechanisms for tracking file location— "path" and "file reference". Path works exactly the way you'd expect (it's a string-based path to a fixed location), while file reference relies on low-level file system metadata to track files. Critically, this means that the file reference will track the object as it's moved/modified within a volume.

Secondly, keep in mind NSURLs are generally "data" objects, meaning they don't "proactively" update their content.

So, the actual issue here starts here:

if (![fm trashItemAtURL:untitledFour resultingItemURL:&resultingURL error:nil])

At the point that method returns, "untitledFour" is no longer entirely coherent, as its path points to the original location, but its reference points to the file in the trash. You can see this for yourself by running this at the top of compareBothURLS:

NSURL* pathURL = untitledFour.filePathURL;
NSURL* refURL = untitledFour.fileReferenceURL;

NSLog(@"1 %@", untitledFour.path);
NSLog(@"2 %@", pathURL.path);
NSLog(@"3 %@", refURL.path);
	
NSLog(@"A %@", untitledFour.fileReferenceURL.description);
NSLog(@"B %@", pathURL.fileReferenceURL.description);
NSLog(@"D %@", refURL.fileReferenceURL.description);

What you'll find is that:

  • In the first log set, "1" & "2" will match, both pointing to the original file location. "3" will not, pointing to the trash instead.

  • In the second log set, "A" & "C" will match, while "B" will not.

More specifically, the strings returned in the second log set will have this format:

file:///.file/id=<number>.<number>/

...and the second number will be different for "B".

With all that context:

(1) The reason getRelationship is returning "same" is that it primarily relies on file reference data and the reference data points to the file in the trash. There's an argument that it shouldn't do this, however. In its defense, using the reference data makes it much easier to sort out issues like hard-linked files and/or symbolic links allowing multiple references to the same file.

(2) The reason "removeCachedResourceValueForKey" changed the behavior is that it deleted the file reference data, forcing NSURL to resolve the data again. You'll actually get exactly the same effect if you test with "untitledFour.filePathURL".

What I'd highlight here is that the "right" behavior here isn't entirely clear. That is, is the problem that "getRelationship" is claiming that two different paths are "the same file"? Or is the problem that NSURL is returning the wrong path value for a specific file?

That question doesn't have a direct answer because the system doesn't really "know" what you actually want— are you trying to track a particular "object" (fileReferenceURL) or are you trying to reference a particular "path" (filePathURL)? It doesn't "know", so it's ended up with an slightly different object that's tracking both...

...but you can tell it what you want, at which point the API will now do exactly what you'd expect. More specifically, you can change the behavior by forcing the URL type you want immediately after you create the directory:

    if (![fm createDirectoryAtURL:untitledFour withIntermediateDirectories:YES attributes:nil error:nil])
    {
        NSLog(@"Test failed");
        return;
    }
    
#if 1
    untitledFour = untitledFour.fileReferenceURL;
#else
    untitledFour = untitledFour.filePathURL;
#endif

Strictly speaking, you could set "filePathURL" anywhere you want, but you can't create a fileReferenceURL to a non-existent object, so it needs to be after the create. In any case, either of those two configurations works the way you'd expect.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Very interesting. Thanks a lot for the detailed responses. As I mentioned briefly in a previous post I stumbled across this commenting out some things to test out an error handling code path. As a result of the matching NSURLFileResourceIdentifierKey the wrong error message was logged. In my app's case this is basically a harmless bug because I do nothing but the behavior did spark my curiousity. In any case I will most likely be removing -getRelationship: calls from app entirely soon.

I'm surprised that a fileReferenceURL would be cached in a filePathURL at all. My expectation is when calling fileReferenceURL on a file path url is to get a reference to the file at the exact file path right now if it is there (or nil) and I would have to hold the fileReferenceURL on first access to follow the file around.

If caching the fileReferenceURL in the URL itself has been determined to be necessary I'm also somewhat surprised an existing fileReferenceURL isn't cleared/updated when files are manipulated via high level APIs like NSFileManager -createDirectoryAtURL:.... etc. After recreation as you mentioned if you did untitledFour.fileReferenceURL you'd be manipulating the folder in trash not the new folder you created. Based on your previous reply it sounds like it would be chalked up as an app bug since you recommend grabbing the fileReferenceURL early. But IMO it isn't obviously clear by the public interface that NSURL may return a 'cached/stale' file reference url. I'm not actually doing this and i'm glad i'm aware of this possibility. It doesn't seem so far fetched that this could be a source of a dataloss bug or something worse.

In my silly example it is obvious that I'm recycling the untitledFour NSURL instance. In a real complex app where you are passing NSURLs around like hot potatoes to various objects it may not be so obvious.

I'm surprised that a fileReferenceURL would be cached in a filePathURL at all. My expectation is when calling fileReferenceURL on a file path URL is to get a reference to the file at the exact file path right now if it is there (or nil) and I would have to hold the fileReferenceURL on first access to follow the file around.

I can understand that thinking, but it's not the system’s perspective. The system’s "view" here is that the file reference is considered more "authoritative" than the path. The reason for this is pretty simple- it's easy for an app to track "a path" (just store it as a string), but the only way an app can track "a file system object" is by doing what file references "do".

The preference for the "reference" object also ends up masking also sort of common behaviors which would otherwise be highly disruptive. For example, it allows users to rename directories without having to worry about the consequence that might have on whatever files their apps might happen to be interacting with.

Based on your previous reply it sounds like it would be chalked up as an app bug since you recommend grabbing the fileReferenceURL early.

So, my own, entirely personal and slightly radical, perspective is that string-based file paths are a fairly broken defect that's become so ingrained in how most developers think about files that it's basically become "stuck" in "all" file system APIs even though they don't actually make ANY sense. Architecturally, every file system in the world ACTUALLY tracks its objects using some kind of artificial identifier (typically "a number") which is separate from object metadata (like "names"). Paths are then constructed by mapping IDs to names and then stringing those names together to make a path.

Relying on paths as the core file identifier means that every operation you perform is going through that same mapping process, opening the door to all sorts of problems which don't really have to exist at all. For example, take "Time-of-check to time-of-use (TOCTOU)" attacks. In the file system context, those attacks all look something like this:

  1. Get the system to check a particular object.
  2. Replace the object the system checked with a different object.
  3. The system now does "something" to a different file/directory than intended to.

These attacks are possible because you can't communicate the ACTUAL object you wanted to manipulate, but are instead forced to pass in a made-up reference to it. MacOS itself is heavily still reliant on path, but fileReferenceURLs are the closest construct we have to an API that doesn't fall into this "trap".

That leads to here:

But IMO it isn't obviously clear by the public interface that NSURL may return a 'cached/stale' file reference URL. I'm not actually doing this and I’m glad I’m aware of this possibility.

The underlying question here is which of these two objects "should" untitledFour be referencing? Is it tracking a fixed path location ("tmp" inside the app data container) or is it tracking a file system object (which has not been moved to "Trash")?

My own view is that "object tracking" is the better default, which means the bug here is that "path" is returning an incorrect value, NOT that getRelationship is returning the same.

However, you're right that the current behavior in this particular case is a bit of a mess, as some APIs are relying on the reference (like "getRelationship") but other APIs are directly pulling the path (like "activateFileViewerSelectingURLs"). I'm not sure what's causing that behavior, but it's absolutely broken and it's not as simple as caching. Expanding on my earlier code, if you add this logging after setting fileReferenceURL:

untitledFour = untitledFour.fileReferenceURL;
NSLog(@"1 %@", untitledFour.path);

...you'll find that the code above logs "tmp" (because the directory hasn't been moved) and the logging in compareBothURLS logs "Trash". Similarly, using "filePathURL" gets you a fixed reference to "tmp". More to the point, if you run this sequence on untitledFour immediately after trashing the directory:

[[NSWorkspace sharedWorkspace] activateFileViewerSelectingURLs:@[untitled.fileReferenceURL.filePathURL]];
sleep(1);
[[NSWorkspace sharedWorkspace] activateFileViewerSelectingURLs:@[untitled]];

...then Finder will open BOTH directories (trash, then app container). In other words, the issue here isn't that NSURL has a stale path value, it's that it's ended up in a weird state that splits the behavior of those two cases and that is definitely a bug (r.171663816).

In my silly example it is obvious that I'm recycling the untitledFour NSURL instance. In a real complex app where you are passing NSURLs around like hot potatoes to various objects it may not be so obvious.

Not at all. I think you've actually found an issue that we need to fix. I think the actual lesson here is that when you actively manipulate a URL, I would recommend deciding what kind of URL you want to "end up with" and then "pull" that type out using filePathURL/fileReferenceURL.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

So, my own, entirely personal and slightly radical, perspective

I so agree with this! I can’t remember if Kevin radicalised me, or I radicalised Kevin, or perhaps we were both radicalised by the File Manager APIs that Apple introduced in Mac OS 9 (-:

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Very interesting info!

I can understand that thinking, but it's not the system’s perspective. The system’s "view" here is that the file reference is considered more "authoritative" than the path.

I actually don't have a strong opinion about file reference urls vs. file path urls but from my perspective in the sample it is clear after the second call to createDirectoryAtURL:..., when that method returns, the fileReferenceURL left lingering in that untitleFour instance is pointing to to the wrong folder based on the fact that I fed that URL directly into a method to create a folder, the operation succeeded, yet there is still a chance to accidentally access a completely different folder through that very same instance going through the cached fileReferenceURL.

MacOS itself is heavily still reliant on path, but fileReferenceURLs are the closest construct we have to an API that doesn't fall into this "trap".

Yea. You can't even create a file without a path so there is that. File reference urls are great at certain things (put this file on the pasteboard - it'll work even if it moves) but you'd really need to introduce a slew of new APIs for developers to really be able to go 'pathless.' Would every method to create a file or folder have to take a parent (fileReferenceURL) parameter?

There is a whole class of applications that display / organize files based on their file system location and file reference don't work very well with that concept. How does something like NSPathControl work in a world where there is no file paths? I guess that's the point. File reference urls describe what, but what about where? You might be able to organize a list of files in a tree and manage parent-child relationships using mostly file reference urls (NSURLParentDirectoryURLKey?) but I won't be the guy trying that in 2026!

I actually don't have a strong opinion about file reference URLs vs. file path URLs, but from my perspective in the sample, it is clear after the second call to createDirectoryAtURL:...,

Yes, and to be clear, that is very much a bug. More specifically, every URL should either be a file reference URL (in which case you'd reference the object in the trash) or a file path URL (in which case you'd reference the original path). You're ending up with a URL which is a mix of both, which is simply wrong. I'd need to test on older OS's, but I suspect this is actually a new bug, possibly introduced in Swift Foundation[1]

[1] Much of the Objective C "version" of Foundation is actually written in Swift now.

Yeah. You can't even create a file without a path, so there is that.

Unfortunately, yes. I'm not claiming my radical idea is something that will ever be (again) implemented. Keep in mind that I'm not claiming that "macOS" is wrong about paths; I think EVERY operating system is "wrong" about paths. Well, every operating system you expect normal people to use. It's entirely possible that paths work great in batch processing mainframes, web servers, and microwaves.

File reference URLs are great at certain things (put this file on the pasteboard - it'll work even if it moves), but you'd really need to introduce a slew of new APIs for developers to really be able to go 'pathless.'

Well, I confess I've been burying the lead a bit. There WAS an API that worked exactly the way I've described, as it's exactly the way macOS Classic handled files. That is, you never got a "path”; you got an "FSRef“ - an opaque object reference to a file.

Would every method to create a file or folder have to take a parent (fileReferenceURL) parameter?

Sure. Why not? That's what the Carbon File Manager did. Isn't that better than having a file end up somewhere the user didn't select just because something in the hierarchy happened to have been renamed at roughly the same time?

There is a whole class of applications that display / organize files based on their file system location and file reference that don't work very well with that concept. How does something like NSPathControl work in a world where there are no file paths?

I'm not sure what you mean here. If you want to know the current path location, then you ask the kernel to convert the reference to a path. That's exactly what happens when you call "path" a file reference URL. That might seem like a "heavy" answer, but keep in mind that the VFS layer is not (and CANNOT be) built around paths, so "every" file system call basically starts as "convert path to vnode".

However, part of the answer here is what apps ALREADY do, which is that you avoid showing the user the full object path. The moment you show the user a full path, the app becomes responsible for keeping that path accurate, something you can't actually do with just paths. This is why the standard convention of "command click on window title to show file location exists". It means apps don't need to actively track the current location of their files, but can instead determine its current location whenever the user actually asks for it.

All of that might sound crazy... but it's ALREADY how fileReferenceURL works. Try adding the following code inside your existing test:

untitledFour = untitled.fileReferenceURL;
NSLog(@"1 %@", untitledFour.path);
NSLog(@"1 %@", untitledFour.path);
NSLog(@"1 %@", untitledFour.path);
NSLog(@"1 %@", untitledFour.path);
NSLog(@"1 %@", untitledFour.path);
NSLog(@"1 %@", untitledFour.path);
NSLog(@"1 %@", untitledFour.path);
NSLog(@"1 %@", untitledFour.path);

...then add a breakpoint on each of those log messages and modify the directory every time at every pause. Here's what I got (home directory path removed for length):

1 ~/Library/Containers/com.dtsapple.kevine.URL-cache-test/Data/tmp/3375A20F-893D-42AC-ADF5-64E411F02951/Untitled 4
1 ~/Library/Containers/com.dtsapple.kevine.URL-cache-test/Data/tmp/3375A20F-893D-42AC-ADF5-64E411F02951/5
1 ~/Downloads/5
1 ~/Downloads/sdfeoodsfads
1 ~/Downloads/cool!
1 ~/Library/Containers/com.dtsapple.kevine.URL-cache-test/Data/tmp/3375A20F-893D-42AC-ADF5-64E411F02951/cool!
1 ~/Library/Containers/com.dtsapple.kevine.URL-cache-test/Data/tmp/3375A20F-893D-42AC-ADF5-64E411F02951/finestuff/dsfds
1 ~/Library/Containers/com.dtsapple.kevine.URL-cache-test/Data/tmp/3375A20F-893D-42AC-ADF5-64E411F02951/finestuff/Untitled

Yes, this means the path lookup isn't being cached. I haven't looked closely at the full mechanics, but I suspect it's just retrieving it every time to ensure that it's as accurate as possible.

I guess that's the point. File reference URLs describe what, but what about where?

"What" vs. "Where" isn't a bad way to think about the difference between paths and file references. However, what is less obvious here is that, in the context of GUI OS built for users, most file system operations are better implemented in terms of "what", not "where". That is, applications are generally expected to "follow" the objects they're given as those objects change, NOT recreate and/or fail if the path hierarchy changes.

You might be able to organize a list of files in a tree and manage parent-child relationships using mostly file reference URLs (NSURLParentDirectoryURLKey?) but I won't be the guy trying that in 2026!

I'm not quite sure what you're thinking of here. If you're trying to display the contents of a directory hierarchy, then the process is exactly the same for paths and references. You start with the container and recursively iterate its contents, displaying whatever you find. If you want to show the path to an object, then you ask for its path.

That actually reminded me of yet another layer to all which is that you shouldn't ACTUALLY be displaying direct "path" to the user ANYWAY. There are a large variety of edge cases where the file system path and the user visible location won't actually match (for example, "iCloud Drive"), which is why "componentsToDisplayForPath" and "displayNameAtPath" exist. And, yes, both of those methods should have URL methods, particularly since the first thing BOTH methods do... is generate a URL for the path.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

I'm not sure what you mean here. If you want to know the current path location, then you ask the kernel to convert the reference to a path.

[..] I'm not quite sure what you're thinking of here. If you're trying to display the contents of a directory hierarchy, then the process is exactly the same for paths and references. You start with the container and recursively iterate its contents, displaying whatever you find.

What about changes? FSEventStreamRef reports changes on a directory. Does it work on file reference urls? I never even thought of trying. I can pick up changes on a subfolder 4 folders away from the root folder I'm watching, all using a single FSEventStreamRef.

I'm not saying it isn't possible I just don't know? how can I parse a single change using current APIs in a deep tree with opaque file reference urls that don't have a concept of location without first converting to path to see what level I'm at?

Sure. Why not? That's what the Carbon File Manager did.

Well I'm not totally against the idea nor was I arguing that it was impossible. I'm just saying it isn't in the API today, so I need a path to create a file. As you already mentioned previously macOS is very path based.

In my original sample the code does not indicate in any obvious way that I wanted to use a file reference URL but merely asking NSFileManager the 'relationship' caused a really undesirable side effect.

but I suspect this is actually a new bug, possibly introduced in Swift Foundation[1] [1] Much of the Objective C "version" of Foundation is actually written in Swift now.

Lord have mercy!

I have another issue that may or may not be related to this in some way but I think I'm going to have to spin up a separate thread.

It means apps don't need to actively track the current location of their files, but can instead determine its current location whenever the user actually asks for it.

What should happen if you expand a directory with four subfolders in a NSBrowser (so four columns). And the third folder moves? If an app just follows the file reference URL the UI is broken. I mean if there are/were APIs that handled this in file reference world I'm with you!

The concept of file location matters to users and to a lot of apps. I hope Apple doesn't want to put apps in single file jail. Here you get that one txt document and if it moves don't worry we got you. There are other types of apps that do other things. Please don't turn macOS into iOS :)

FSEventStreamRef reports changes on a directory.

I have some comments below, but FSEvents are a complicated topic in their own right. If you've got a product that's actually focused on monitoring and updating a large hierarchy (this isn't actually all that common), then it might be worth starting a new thread that's focused on those issues.

Does it work on file reference URLs?

No, it takes the opposite approach, which is basically "force you to iterate the hierarchy until it happens to stabilize". The core underlying issue here is that the file system doesn't restrict how it's modified and the API it provides only provides a "picture" of a particular instant in time. That is, when you ask for a "directories contents" what you actually get is the contents at that instant without any guarantee that they're still the "current" contents. Most of the time the contents don't change fast enough for that to matter, but in a frequently changing directory the "past" and "now" can diverge VERY quickly.

You'll note that FSEventStreamCreate takes a "latency" argument- that's basically "how long should I hide changes from you so that you don't notice intermediate changes".

The other problem here is that you're assuming the APIs involved here operate on paths... when they don't. For example, the way you actually read the contents of a directory is using APIs like opendir/readir or getattrlistbulk. Those APIs don't operate on paths, they operate on references opened into the VFS system. If a directory is moved in the midst of an iteration, those APIs will continue to return the contents at its new location.

Note that this INCLUDES FSEvents. That is, when you call "FSEventStreamCreate", you’re creating a monitor for a directory entry in the VFS layer, NOT for a specific path. If the directory you're monitoring moves... then FSEventStreamCreate will happily continue sending you events about that directory, NOT the original path location. If you want to know when the directory you're monitoring moves, then that's what "kFSEventStreamCreateFlagWatchRoot" is for. Note that it doesn't change the monitor point, it just tells you that the object you're currently monitoring happens to have moved.

I also have to call out this line from the documentation:

"If you want to track the current location of a directory, it is best to open the directory before creating the stream so that you have a file descriptor for it and can issue an F_GETPATH fcntl() to find the current path."

That's the polite way of warning you that FSEvents isn't actually tracking its target's path and won't be able to tell you where it's located if it moves.

I'm not saying it isn't possible, I just don't know? How can I parse a single change using current APIs in a deep tree with opaque file reference URLs that don't have a concept of location without first converting to path to see what level I'm at?

Strictly speaking, iterating a deep hierarchy using PURE paths is potentially quite dangerous. If/when directories move mid-iteration, then the app ends up thinking it's iterating the contents of a directory in one location while thinking it's actually the contents of the original location. At a minimum, that makes the contents somewhat... confused. At worst, it opens the doors to security attacks.

Most apps can basically ignore these issues—they implicitly assume that the user isn't trying actively “to undermine" whatever they're trying to do and they accept/ignore the fact that they'll fail if/when the user does something that changes their hierarchy in a way they don't expect.

The "safe" way to iterate and access a large hierarchy is "openat()", which takes a file descriptor as its first argument. Used recursively, this ensures that you're always "anchored" to a specific directory, so you're always accessing a consistent location even if locations move.

Summing the issues here up, the worst case here means that you only have two choices about what an iteration will return:

  1. (Using paths) The contents shown will be the "mixed" contents of 1 or more other directories, determined by exactly when the user moves which directories.

  2. (using openat) The contents shown will be derived from the actual contents of the specific directory the iteration started from. No content will ever be iterated that wasn't at some point "in" that hierarchy. All directories iterated will have been "inside" their parent directory at the point the iteration reached that directory.

Note that neither of those choices really match what we think of as "the contents of a directory", however, #2 is MUCH more coherent than #1.

Finally, keep in mind that the larger context here matters. The issue above is a MASSIVE problem for an app that’s acting on the contents of a particular hierarchy and a fairly minor detail for an app that’s actively monitoring displaying directory contents.

That’s because, in the display case, you end up designing around the idea that whatever changes are happening will eventually “stop”, at which point your display engine will “catch up” with the latest state and refresh to match “the truth”. That is, if a directory moves mid-iteration, that display will eventually be “cleared out” by the update that told your engine the directory no longer existed in the monitored hierarchy. Typically, that happens fast enough that you’d never even know it happened.

I'm just saying it isn't in the API today, so I need a path to create a file. As you already mentioned previously, macOS is very path-based.

That's the somewhat subtle "truth" underneath these issues. We tend to "think" in terms of paths, but the file system DOESN'T, which means it ends up creating issues and problems we don't even recognize as problems... because paths have "masked" the truth from us.

Now, that doesn't necessarily mean paths are always "wrong". Case in point... how FSEvents uses them. FSEvents isn't actually trying to tell you what actually changed, just that "something" changed, which you should then go and scan. That's part of the reason it uses paths— if an object has changed multiple times, it doesn't want to track those other objects, so it just tells you the path that changed.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

NSFileManager getRelationship:ofDirectoryAtURL:toItemAtURL:error: returning NSURLRelationshipSame for Different Directories
 
 
Q