After throwing just about everything I could think of at the wall, I did end up making a change that seems to have totally stopped the crashing.
For the record, I have a thread going with DTS that seems skeptical of that change fixing anything, so please take this with a grain of salt. They're saying that it's likely to be the result of either concurrency violation or memory corruption, and recommending debugging the app with both the -com.apple.CoreData.ConcurrencyDebug 1 app argument and Xcode's Guard Malloc turned on.
That disclaimer aside, here's what happened with our app. We have a particular managed object subclass in our app with a couple of transient properties whose value are derived from another (persistent) property. In order to ensure that the transient values are reset when changes are merged from other contexts (often an update from our API), that subclass overrode -awakeFromSnapshotEvents: like so:
- (void)awakeFromSnapshotEvents:(NSSnapshotEventType)flags
{
[super awakeFromSnapshotEvents:flags];
[self setPrimitiveFoo:nil];
[self setPrimitiveBar:nil];
}
I was able to get a semi-reproducible case where I observed that merging the changes from the other context reset the affected objects, turning them back into faults (and resetting the transient properties). I added a basic conditional to that method to check whether the receiver was already a fault before clearing the transient properties' values.
- (void)awakeFromSnapshotEvents:(NSSnapshotEventType)flags
{
[super awakeFromSnapshotEvents:flags];
if( !self.isFault ) {
[self setPrimitiveFoo:nil];
[self setPrimitiveBar:nil];
}
}
I still don't fully understand why this change fixed anything, but we haven't seen the app crash trying to get a snapshot since. I confess I haven't had time yet to go back to the previous state and try running with Guard Malloc enabled to see whether that would shed more light on the issue.