Thank you very much. I'll try your suggested approach as well. Seems interesting.
I think the inefficiency comes from the basic functionality of the APIs I found so far ( in MacOS Cocoa).
They all assume that I'm interested in the actual items, and they fetch/pre-fetch meta-data for each file/item whereas all I need to do is COUNT. Furthermore - all existing APIs that support recursive scan of directory hierarchies - are AGGREGATING the results along the scan, and won't return until they have the full list of items.
This of course both burdens CPU work and Memory consumption.
I never thought to go to Posix APIs until now, because
I dislike them (A True Mac programmer since 1987) because they're all synchronous and plain dumb - they do NOT cover the rich and strange behaviors and attributes of modern file-system.
At least in the past, apple had its low-level FileSystem APIs exposed, and I could work magic with them. Since then the FileSystem has changed, but I don't know which APIs exist today.
By example, I wrote the following naive code:
static NSUInteger totalCount = 0; NSDirectoryEnumerator *dirPathEnumenumerator = [fm enumeratorAtPath:@"/Users/me/Documents"];
NSString *currRelativePath = nil; // local path
while ( (currRelativePath = [dirPathEnumenumerator nextObject]) != nil) {
NSDictionary *fileAttributes = [dirPathEnumenumerator fileAttributes];
if (![fileAttributes[NSFileType] isEqualToString:NSFileTypeDirectory])
totalCount++;
And for my ~100,000 files Documents folder, it took about 6 seconds to run.
I then wrote it differently - using SHALLOW directory enumeration, and instead of recursing, I dispatched "need to scan" code-blocks onto concurrent NSOperationQueue, thus removing recursion (and stacks) and also spreading the task over several cores -- like this:
static NSUInteger totalCount = 0;
static NSFileManager *fmm = nil;
static NSArray *requiredProperties = nil;
-(void)countFilesInDocumentsFolder {
NSOperationQueue *q = [[NSOperationQueue alloc] init];
q.maxConcurrentOperationCount = 5;
q.qualityOfService = NSQualityOfServiceUtility;
q.name = @"file counting queue";
dirFullPath = @"/Users/me/Documents";
NSURL *topURL = [NSURL fileURLWithPath:dirFullPath];
if (fmm == nil)
fmm = [NSFileManager defaultManager];
if (requiredProperties == nil)
requiredProperties = @[NSURLNameKey, NSURLIsRegularFileKey ,NSURLIsDirectoryKey, NSURLIsSymbolicLinkKey, NSURLIsVolumeKey, NSURLIsPackageKey];
[self countFilesInDirectory:topURL usingQueue:q];
[q waitUntilAllOperationsAreFinished];
NSLog (@"Total File count in directory: %@ is: %lu", dirFullPath, totalCount);
}
-(void)countFilesInDirectory:(NSURL *)directoryURL usingQueue:(NSOperationQueue *)queue {
[queue addOperationWithBlock:^{
NSError *error = nil;
NSArray<NSURL *> *itemURLs = [fmm contentsOfDirectoryAtURL:directoryURL includingPropertiesForKeys:requiredProperties options:NSDirectoryEnumerationSkipsHiddenFiles error:&error];
if (error) {
NSLog(@"Failed to get contents of: %@, Error:%@", directoryURL, error);
return;
};
for (NSURL *url in itemURLs) {
NSDictionary<NSURLResourceKey, id> *fileAttributes = [url resourceValuesForKeys:requiredProperties error:&error];
if (error!=nil || fileAttributes == nil) {
NSLog(@"Failed to retrieve attributes for:%@ Error:%@",url, error);
continue;
}
if ([fileAttributes[NSURLIsDirectoryKey] boolValue]) {
[queue addOperationWithBlock:^{
[self countFilesInDirectory:url usingQueue:queue];
}];
}
else {
if ( [fileAttributes[NSURLIsRegularFileKey] boolValue])
totalCount++;
}
}
}
}];
}
And this one - although lengthy - took about 0.45 sec to do the same job (even a little better).
So... with my ***** of a Mac and a fast-as-hell SSD, I am still very far from satisfied.
I'll go down the POSIX rabbit hole and see what goes.
Thanks!
If I find the POSIX faster than my current thing, I'll accept your answer as the best :)