Hi,
I’m required to identify file content type (e.g. - tell you that a file is in PDF format, even if the user forced its name to end with .docX, .txt, or even removed it altogether. In other words - identify file type by its real contents. I need to do this fast, for lots of files.
I searched in vain through the MacOS different APIs/Frameworks, from LaunchServices, via MDLS, NSWorkspace, NSURL, and NSFileManager — to no avail. These all provide wonderful APIs for identifying file types - but miserably report the file type as “Microsoft Word” if its filename extension has been set to “.doc” or “.docx”, no matter the content.
I then found the ‘file’ command-line in Terminal which does EXACTLY what I want, and reports the correct type every time (well maybe it fails somethings, but I haven’t seen it fail once so far.)
Reading ‘man file’ I leaned that it examines a file in 3 stages. stat(2) to start with (identifying Unix things like pipes, sockets, symbolic links etc.) then, it works using some 'unix style' thing called “magic number” mechanism, that employs a “compiled magic file” /usr/share/file/magic.mgc containing “binary signatures” or special “magic numbers” at known offsets that allow quick identification of file formats.
Tiny hacking into this file using ’strings’ command I found a rather huge list of formats identifiable by MacOS out of the box - plus - according to man page of file , you should be able to add more “magic” files yourself!
However, I wouldn't want to spawn a 'file' command process every time I need to identify a file. I'd rather call some code, or framework from within my process. (This process is of high sensitivity - it is an "Endpoint Security Client" and has lots of restrictions.
Is there any public API (Cocoa, Unix, Posix, Core-Foundation, anything!) that will use this "Magic" mechanism to tell me the type of a file?
Thank you very much.