CoreData and NSOperation
-
Hi,
We know about core data and multi-threading, that you need one
NSManagedObjectContext per thread to prevent trouble saving/accessing
managed objects concurrently. How about NSOperation? Does the same
apply there? Say I want to parse a big file into managedobjects, I
would chunk it into pieces and have each piece be processed by an
NSOperation. However, after processing it should end up as an
NSManagedObject with a bunch of properties, all saved in one
persistent store. Can you simply create a new object in the central
MOC shared between all operations or do you need again one MOC per
NSOperation, with the appropriate refreshing of your central MOC to
update the UI? And what about saving your MOC at the end of each
operation or should you wait till all operations have completed? I
guess that depends one the one vs many MOCs as well. Any advise would
be welcome!
Thanks,
Alex
**********************************************
** Alexander Griekspoor PhD **
**********************************************
mekentosj.com
Papers - Your Personal Library of Science
2007 Winner of the Apple Design Awards
Best Mac OS X Scientific Solution
http://www.mekentosj.com/papers
********************************************** -
Alex,
Conceptually, you should treat NSOperations as if they were on a
separate thread. The OS may take certain liberties with
implementation details based on system load and other factors, but
basically NSOperations might as well be described as a light weight
mechanism for creating threaded tasks.
Importing tasks are often easily parallelizable by simply importing
1/Nth of the data on a thread/operation. Here's an excerpt of some
code I've been working with recently. It's GC and non-GC compatible,
and has 3 implementations for comparison: NSOperation, NSThread, and
boring serial code. As you can see, the NSOperation version is
basically the same in terms of thread handling, but NSOperationQueue
provides some convenient out-of-box handling for finding out when the
tasks are complete. The NSThread code has whacky NSConditions and
memory barriers.
The key to making this pattern useful is that each element in the
work queue ('keyQueues' below) is sufficiently large to be worth the
overhead of queuing up. In this sample code, each key is a file
path, so this is importing from a directory of files, importing
'maxCores' files simultaneously.
This division of labor doesn't work if the data in each 1/N sets has
relationships to data in other import groups.
static OSSpinLock _queueLock;
static NSOperationQueue* _operationQueue;
static NSDate *_startDate;
#define USE_NSOPERATIONS 1
// #define USE_NSTHREADS 1
- (IBAction)createEntities:(id)sender
{
_startDate = [[NSDate date] retain];
_operationQueue = [[NSOperationQueue alloc] init];
NSUInteger j = 0;
NSUInteger maxCores = [[NSProcessInfo processInfo] activeProcessorCount];
NSMutableArray* keyQueues = [[NSMutableArray alloc] init];
for(NSString* key in s_importFileSet) {
[keyQueues addObject:key];
}
#if USE_NSOPERATIONS
for (j = 0; j < maxCores; j++) {
NSOperation* op = [[NSInvocationOperation alloc]
initWithTarget:self selector:@selector(processFilesForKeys:)
object:keyQueues];
[_operationQueue addOperation:op];
[op release];
}
#elif USE_NSTHREADS
_condition = [[NSCondition alloc] init];
_notFinished = maxCores;
OSMemoryBarrier();
for (j = 0; j < maxCores; j++) {
[NSThread
detachNewThreadSelector:@selector(processFilesForKeys:)
toTarget:self withObject:keyQueues];
}
#else
for (j = 0; j < maxCores; j++) {
[self processFilesForKeys:keyQueues[j]];
}
#endif
[NSThread
detachNewThreadSelector:@selector(finishImportOperation:)
toTarget:self withObject:keyQueues];
}
- (void)finishImportOperation:(id)keys {
NSAutoreleasePool* pool = [[NSAutoreleasePool alloc] init];
#if USE_NSOPERATIONS
[_operationQueue waitUntilAllOperationsAreFinished];
#elif USE_NSTHREADS
[_condition lock];
while (_notFinished > 0) {
[_condition wait];
}
[_condition unlock];
#else
#endif
[keys release];
[_operationQueue release];
_operationQueue = nil;
NSLog(@"Total create time %f", [[NSDate date]
timeIntervalSinceDate:_startDate] );
[_startDate release];
[pool drain];
}
- (void)processFilesForKeys:(NSMutableArray*)importKeys {
NSAutoreleasePool* pool = [[NSAutoreleasePool alloc] init];
DataDumpImporter* importer = [[DataDumpImporter alloc] init];
NSPersistentStoreCoordinator* mainPSC = [[appDelegate
managedObjectContext] persistentStoreCoordinator];
NSPersistentStoreCoordinator* psc =
[[NSPersistentStoreCoordinator alloc]
initWithManagedObjectModel:[mainPSC managedObjectModel]];
[psc addPersistentStoreWithType:NSSQLiteStoreType
configuration:nil URL:[[[mainPSC persistentStores] lastObject] URL]
options:[NSDictionary dictionaryWithObject:[NSDictionary
dictionaryWithObject:@"0" forKey:@"synchronous"]
forKey:NSSQLitePragmasOption] error:nil];
// we disable synchronous because if an import fails, we can
delete the file and re-import.
// if you can't just delete the file, don't do this
NSManagedObjectContext *moc = [[NSManagedObjectContext alloc] init];
[moc setPersistentStoreCoordinator:psc];
[psc release];
[importer setImportPath:[self importPath]];
[importer setMoc:moc];
[moc setUndoManager:nil];
while (1) {
NSString* key = nil;
OSSpinLockLock(&_queueLock);
key = [importKeys lastObject];
if (key) {
[importKeys removeLastObject];
}
OSSpinLockUnlock(&_queueLock);
if (!key) {
break;
}
@try {
DataDumpImporterParams *params = [s_entityImporterParams
objectForKey:key];
[importer importFile:[params filename] usingEntity:key
andFlags:[params flags]];
} @catch (id e) {
NSLog(@"e = %@", e);
}
}
[importer release];
[moc release];
[pool drain];
OSAtomicDecrement32Barrier(&_notFinished);
[_condition lock];
[_condition signal];
[_condition unlock];
}
--
-Ben -
Great! Thanks for the detailed explanation Ben.
Alex
On 31 jan 2008, at 00:38, Ben Trumbull wrote:
> Alex,
>
> Conceptually, you should treat NSOperations as if they were on a
> separate thread. The OS may take certain liberties with
> implementation details based on system load and other factors, but
> basically NSOperations might as well be described as a light weight
> mechanism for creating threaded tasks.
>
> Importing tasks are often easily parallelizable by simply importing
> 1/Nth of the data on a thread/operation. Here's an excerpt of some
> code I've been working with recently. It's GC and non-GC
> compatible, and has 3 implementations for comparison: NSOperation,
> NSThread, and boring serial code. As you can see, the NSOperation
> version is basically the same in terms of thread handling, but
> NSOperationQueue provides some convenient out-of-box handling for
> finding out when the tasks are complete. The NSThread code has
> whacky NSConditions and memory barriers.
>
> The key to making this pattern useful is that each element in the
> work queue ('keyQueues' below) is sufficiently large to be worth the
> overhead of queuing up. In this sample code, each key is a file
> path, so this is importing from a directory of files, importing
> 'maxCores' files simultaneously.
>
> This division of labor doesn't work if the data in each 1/N sets has
> relationships to data in other import groups.
**********************************************
** Alexander Griekspoor PhD **
**********************************************
mekentosj.com
EnzymeX - To cut or not to cut
2006 Winner of the Apple Design Awards
Best Mac OS X Scientific Solution
http://www.mekentosj.com/enzymex
********************************************** -
A couple comments on the NSOperationQueue usage ...
(1) I would say that since you are choosing to only spawn maxCores
operations, you should divide up the s_importFileSet into maxCores
different arrays, and then you wouldn't have to share importKeys in
processFilesForKeys: and have to lock around it.
The shared queue importKeys style is more suited to the NSThread
approach. But, if the threads block on I/O, you're potentially under-
utilizing cores while those maxCores threads (or operations) wait.
But, by blindly splitting s_importFileSet into N pieces, one thread
might get all the really cheap files to import and one thread might
get all the expensive ones, making the start-to-finish latency more
than it'd have to be.
If one ignores all other potential issues where things might be
fighting one another (processor cache affinities, kernel file buffer
cache space, potential global locks in lower layers, per-operation
RAM usage, heat and power-management and clock speed issues in the
processors, etc.), then ...
(2) I would say that the most natural decomposition of the overall
task here (wrt NSOperationQueue) would be to create one NSOperation
per element in s_importFileSet, throw those into the
NSOperationQueue, and let it grind away. Let it and the kernel worry
about maxCores. There's bound to be plenty of blocking in I/O, and
the optimal number of operations is likely more than -
activeProcessorCount (some can run with the data they have while
others block waiting for their data). If you identify the work to be
done using operations (rather than embedding it implicitly in a loop-
until-importKeys-empty loop), the kernel can run other operations
that have their data available while other operations block while the
disk/network hardware fetches data.
Chris Kane
Cocoa Frameworks, Apple
On Jan 30, 2008, at 4:38 PM, Ben Trumbull wrote:
> Alex,
>
> Conceptually, you should treat NSOperations as if they were on a
> separate thread. The OS may take certain liberties with
> implementation details based on system load and other factors, but
> basically NSOperations might as well be described as a light weight
> mechanism for creating threaded tasks.
>
> Importing tasks are often easily parallelizable by simply importing
> 1/Nth of the data on a thread/operation. Here's an excerpt of some
> code I've been working with recently. It's GC and non-GC
> compatible, and has 3 implementations for comparison: NSOperation,
> NSThread, and boring serial code. As you can see, the NSOperation
> version is basically the same in terms of thread handling, but
> NSOperationQueue provides some convenient out-of-box handling for
> finding out when the tasks are complete. The NSThread code has
> whacky NSConditions and memory barriers.
>
> The key to making this pattern useful is that each element in the
> work queue ('keyQueues' below) is sufficiently large to be worth
> the overhead of queuing up. In this sample code, each key is a
> file path, so this is importing from a directory of files,
> importing 'maxCores' files simultaneously.
>
> This division of labor doesn't work if the data in each 1/N sets
> has relationships to data in other import groups.
>
> static OSSpinLock _queueLock;
> static NSOperationQueue* _operationQueue;
> static NSDate *_startDate;
>
> #define USE_NSOPERATIONS 1
> // #define USE_NSTHREADS 1
>
> - (IBAction)createEntities:(id)sender
> {
> _startDate = [[NSDate date] retain];
>
> _operationQueue = [[NSOperationQueue alloc] init];
> NSUInteger j = 0;
> NSUInteger maxCores = [[NSProcessInfo processInfo]
> activeProcessorCount];
> NSMutableArray* keyQueues = [[NSMutableArray alloc] init];
> for(NSString* key in s_importFileSet) {
> [keyQueues addObject:key];
> }
> #if USE_NSOPERATIONS
> for (j = 0; j < maxCores; j++) {
> NSOperation* op = [[NSInvocationOperation alloc]
> initWithTarget:self selector:@selector(processFilesForKeys:)
> object:keyQueues];
> [_operationQueue addOperation:op];
> [op release];
> }
> #elif USE_NSTHREADS
> _condition = [[NSCondition alloc] init];
> _notFinished = maxCores;
> OSMemoryBarrier();
> for (j = 0; j < maxCores; j++) {
> [NSThread detachNewThreadSelector:@selector
> (processFilesForKeys:) toTarget:self withObject:keyQueues];
> }
> #else
> for (j = 0; j < maxCores; j++) {
> [self processFilesForKeys:keyQueues[j]];
> }
> #endif
> [NSThread detachNewThreadSelector:@selector
> (finishImportOperation:) toTarget:self withObject:keyQueues];
> }
>
> - (void)finishImportOperation:(id)keys {
> NSAutoreleasePool* pool = [[NSAutoreleasePool alloc] init];
> #if USE_NSOPERATIONS
> [_operationQueue waitUntilAllOperationsAreFinished];
> #elif USE_NSTHREADS
> [_condition lock];
> while (_notFinished > 0) {
> [_condition wait];
> }
> [_condition unlock];
> #else
> #endif
> [keys release];
> [_operationQueue release];
> _operationQueue = nil;
> NSLog(@"Total create time %f", [[NSDate date]
> timeIntervalSinceDate:_startDate] );
> [_startDate release];
> [pool drain];
> }
>
> - (void)processFilesForKeys:(NSMutableArray*)importKeys {
> NSAutoreleasePool* pool = [[NSAutoreleasePool alloc] init];
> DataDumpImporter* importer = [[DataDumpImporter alloc] init];
>
> NSPersistentStoreCoordinator* mainPSC = [[appDelegate
> managedObjectContext] persistentStoreCoordinator];
> NSPersistentStoreCoordinator* psc =
> [[NSPersistentStoreCoordinator alloc] initWithManagedObjectModel:
> [mainPSC managedObjectModel]];
> [psc addPersistentStoreWithType:NSSQLiteStoreType
> configuration:nil URL:[[[mainPSC persistentStores] lastObject] URL]
> options:[NSDictionary dictionaryWithObject:[NSDictionary
> dictionaryWithObject:@"0" forKey:@"synchronous"]
> forKey:NSSQLitePragmasOption] error:nil];
> // we disable synchronous because if an import fails, we can
> delete the file and re-import.
> // if you can't just delete the file, don't do this
>
> NSManagedObjectContext *moc = [[NSManagedObjectContext alloc] init];
> [moc setPersistentStoreCoordinator:psc];
> [psc release];
> [importer setImportPath:[self importPath]];
> [importer setMoc:moc];
>
> [moc setUndoManager:nil];
>
> while (1) {
> NSString* key = nil;
> OSSpinLockLock(&_queueLock);
> key = [importKeys lastObject];
> if (key) {
> [importKeys removeLastObject];
> }
> OSSpinLockUnlock(&_queueLock);
> if (!key) {
> break;
> }
> @try {
> DataDumpImporterParams *params =
> [s_entityImporterParams objectForKey:key];
> [importer importFile:[params filename] usingEntity:key
> andFlags:[params flags]];
> } @catch (id e) {
> NSLog(@"e = %@", e);
> }
> }
> [importer release];
> [moc release];
> [pool drain];
> OSAtomicDecrement32Barrier(&_notFinished);
> [_condition lock];
> [_condition signal];
> [_condition unlock];
> }
>
> --
>
> -Ben



