CoreData and NSOperation

  • Hi,

    We know about core data and multi-threading, that you need one
    NSManagedObjectContext per thread to prevent trouble saving/accessing
    managed objects concurrently. How about NSOperation? Does the same
    apply there? Say I want to parse a big file into managedobjects, I
    would chunk it into pieces and have each piece be processed by an
    NSOperation. However, after processing it should end up as an
    NSManagedObject with a bunch of properties, all saved in one
    persistent store. Can you simply create a new object in the central
    MOC shared between all operations or do you need again one MOC per
    NSOperation, with the appropriate refreshing of your central MOC to
    update the UI? And what about saving your MOC at the end of each
    operation or should you wait till all operations have completed? I
    guess that depends one the one vs many MOCs as well. Any advise would
    be welcome!
    Thanks,
    Alex

    **********************************************
              ** Alexander Griekspoor  PhD **
    **********************************************
              mekentosj.com

      Papers - Your Personal Library of Science
      2007 Winner of the Apple Design Awards
              Best Mac OS X Scientific Solution
              http://www.mekentosj.com/papers
    **********************************************
  • Alex,

    Conceptually, you should treat NSOperations as if they were on a
    separate thread.  The OS may take certain liberties with
    implementation details based on system load and other factors, but
    basically NSOperations might as well be described as a light weight
    mechanism for creating threaded tasks.

    Importing tasks are often easily parallelizable by simply importing
    1/Nth of the data on a thread/operation.  Here's an excerpt of some
    code I've been working with recently.  It's GC and non-GC compatible,
    and has 3 implementations for comparison: NSOperation, NSThread, and
    boring serial code.  As you can see, the NSOperation version is
    basically the same in terms of thread handling, but NSOperationQueue
    provides some convenient out-of-box handling for finding out when the
    tasks are complete.  The NSThread code has whacky NSConditions and
    memory barriers.

    The key to making this pattern useful is that each element in the
    work queue ('keyQueues' below) is sufficiently large to be worth the
    overhead of queuing up.  In this sample code, each key is a file
    path, so this is importing from a directory of files, importing
    'maxCores' files simultaneously.

    This division of labor doesn't work if the data in each 1/N sets has
    relationships to data in other import groups.

    static OSSpinLock _queueLock;
    static NSOperationQueue* _operationQueue;
    static NSDate *_startDate;

    #define USE_NSOPERATIONS 1
    // #define USE_NSTHREADS 1

    - (IBAction)createEntities:(id)sender
    {
    _startDate = [[NSDate date] retain];

        _operationQueue = [[NSOperationQueue alloc] init];

        NSUInteger j = 0;
        NSUInteger maxCores = [[NSProcessInfo processInfo] activeProcessorCount];
        NSMutableArray* keyQueues = [[NSMutableArray alloc] init];

    for(NSString* key in s_importFileSet) {
                [keyQueues addObject:key];
    }

    #if USE_NSOPERATIONS
        for (j = 0; j < maxCores; j++) {
                NSOperation* op = [[NSInvocationOperation alloc]
    initWithTarget:self selector:@selector(processFilesForKeys:)
    object:keyQueues];
                [_operationQueue addOperation:op];
                [op release];
            }
    #elif USE_NSTHREADS
        _condition = [[NSCondition alloc] init];
        _notFinished = maxCores;
        OSMemoryBarrier();

        for (j = 0; j < maxCores; j++) {
            [NSThread
    detachNewThreadSelector:@selector(processFilesForKeys:)
    toTarget:self withObject:keyQueues];
        }
    #else
        for (j = 0; j < maxCores; j++) {
            [self processFilesForKeys:keyQueues[j]];
        }
    #endif

        [NSThread
    detachNewThreadSelector:@selector(finishImportOperation:)
    toTarget:self withObject:keyQueues];
    }

    - (void)finishImportOperation:(id)keys {
        NSAutoreleasePool* pool = [[NSAutoreleasePool alloc] init];

    #if USE_NSOPERATIONS
        [_operationQueue waitUntilAllOperationsAreFinished];
    #elif USE_NSTHREADS
        [_condition lock];
        while (_notFinished > 0) {
            [_condition wait];
        }
        [_condition unlock];
    #else
    #endif

        [keys release];
        [_operationQueue release];
        _operationQueue = nil;

    NSLog(@"Total create time %f", [[NSDate date]
    timeIntervalSinceDate:_startDate] );
        [_startDate release];
        [pool drain];
    }

    - (void)processFilesForKeys:(NSMutableArray*)importKeys {
        NSAutoreleasePool* pool = [[NSAutoreleasePool alloc] init];

    DataDumpImporter* importer = [[DataDumpImporter alloc] init];

        NSPersistentStoreCoordinator* mainPSC = [[appDelegate
    managedObjectContext] persistentStoreCoordinator];
        NSPersistentStoreCoordinator* psc =
    [[NSPersistentStoreCoordinator alloc]
    initWithManagedObjectModel:[mainPSC managedObjectModel]];
        [psc addPersistentStoreWithType:NSSQLiteStoreType
    configuration:nil URL:[[[mainPSC persistentStores] lastObject] URL]
    options:[NSDictionary dictionaryWithObject:[NSDictionary
    dictionaryWithObject:@"0" forKey:@"synchronous"]
    forKey:NSSQLitePragmasOption] error:nil];
    // we disable synchronous because if an import fails, we can
    delete the file and re-import.
    // if you can't just delete the file, don't do this

    NSManagedObjectContext *moc = [[NSManagedObjectContext alloc] init];
    [moc setPersistentStoreCoordinator:psc];
        [psc release];
    [importer setImportPath:[self importPath]];
    [importer setMoc:moc];

    [moc setUndoManager:nil];

        while (1) {
            NSString* key = nil;

            OSSpinLockLock(&_queueLock);
            key = [importKeys lastObject];
            if (key) {
                [importKeys removeLastObject];
            }
            OSSpinLockUnlock(&_queueLock);

            if (!key) {
                break;
            }

            @try {
                DataDumpImporterParams *params = [s_entityImporterParams
    objectForKey:key];
                [importer importFile:[params filename] usingEntity:key
    andFlags:[params flags]];
            } @catch (id e) {
                NSLog(@"e = %@", e);
            }
    }
        [importer release];
        [moc release];
        [pool drain];

        OSAtomicDecrement32Barrier(&_notFinished);
        [_condition lock];
        [_condition signal];
        [_condition unlock];
    }

    --

    -Ben
  • Great! Thanks for the detailed explanation Ben.
    Alex

    On 31 jan 2008, at 00:38, Ben Trumbull wrote:

    > Alex,
    >
    > Conceptually, you should treat NSOperations as if they were on a
    > separate thread.  The OS may take certain liberties with
    > implementation details based on system load and other factors, but
    > basically NSOperations might as well be described as a light weight
    > mechanism for creating threaded tasks.
    >
    > Importing tasks are often easily parallelizable by simply importing
    > 1/Nth of the data on a thread/operation.  Here's an excerpt of some
    > code I've been working with recently.  It's GC and non-GC
    > compatible, and has 3 implementations for comparison: NSOperation,
    > NSThread, and boring serial code.  As you can see, the NSOperation
    > version is basically the same in terms of thread handling, but
    > NSOperationQueue provides some convenient out-of-box handling for
    > finding out when the tasks are complete.  The NSThread code has
    > whacky NSConditions and memory barriers.
    >
    > The key to making this pattern useful is that each element in the
    > work queue ('keyQueues' below) is sufficiently large to be worth the
    > overhead of queuing up.  In this sample code, each key is a file
    > path, so this is importing from a directory of files, importing
    > 'maxCores' files simultaneously.
    >
    > This division of labor doesn't work if the data in each 1/N sets has
    > relationships to data in other import groups.

    **********************************************
              ** Alexander Griekspoor  PhD **
    **********************************************
              mekentosj.com

                EnzymeX - To cut or not to cut
      2006 Winner of the Apple Design Awards
              Best Mac OS X Scientific Solution
            http://www.mekentosj.com/enzymex
    **********************************************
  • A couple comments on the NSOperationQueue usage ...

    (1) I would say that since you are choosing to only spawn maxCores
    operations, you should divide up the s_importFileSet into maxCores
    different arrays, and then you wouldn't have to share importKeys in
    processFilesForKeys: and have to lock around it.

    The shared queue importKeys style is more suited to the NSThread
    approach.  But, if the threads block on I/O, you're potentially under-
    utilizing cores while those maxCores threads (or operations) wait.

    But, by blindly splitting s_importFileSet into N pieces, one thread
    might get all the really cheap files to import and one thread might
    get all the expensive ones, making the start-to-finish latency more
    than it'd have to be.

    If one ignores all other potential issues where things might be
    fighting one another (processor cache affinities, kernel file buffer
    cache space, potential global locks in lower layers, per-operation
    RAM usage, heat and power-management and clock speed issues in the
    processors, etc.), then ...

    (2) I would say that the most natural decomposition of the overall
    task here (wrt NSOperationQueue) would be to create one NSOperation
    per element in s_importFileSet, throw those into the
    NSOperationQueue, and let it grind away.  Let it and the kernel worry
    about maxCores.  There's bound to be plenty of blocking in I/O, and
    the optimal number of operations is likely more than -
    activeProcessorCount (some can run with the data they have while
    others block waiting for their data).  If you identify the work to be
    done using operations (rather than embedding it implicitly in a loop-
    until-importKeys-empty loop), the kernel can run other operations
    that have their data available while other operations block while the
    disk/network hardware fetches data.

    Chris Kane
    Cocoa Frameworks, Apple

    On Jan 30, 2008, at 4:38 PM, Ben Trumbull wrote:

    > Alex,
    >
    > Conceptually, you should treat NSOperations as if they were on a
    > separate thread.  The OS may take certain liberties with
    > implementation details based on system load and other factors, but
    > basically NSOperations might as well be described as a light weight
    > mechanism for creating threaded tasks.
    >
    > Importing tasks are often easily parallelizable by simply importing
    > 1/Nth of the data on a thread/operation.  Here's an excerpt of some
    > code I've been working with recently.  It's GC and non-GC
    > compatible, and has 3 implementations for comparison: NSOperation,
    > NSThread, and boring serial code.  As you can see, the NSOperation
    > version is basically the same in terms of thread handling, but
    > NSOperationQueue provides some convenient out-of-box handling for
    > finding out when the tasks are complete.  The NSThread code has
    > whacky NSConditions and memory barriers.
    >
    > The key to making this pattern useful is that each element in the
    > work queue ('keyQueues' below) is sufficiently large to be worth
    > the overhead of queuing up.  In this sample code, each key is a
    > file path, so this is importing from a directory of files,
    > importing 'maxCores' files simultaneously.
    >
    > This division of labor doesn't work if the data in each 1/N sets
    > has relationships to data in other import groups.
    >
    > static OSSpinLock _queueLock;
    > static NSOperationQueue* _operationQueue;
    > static NSDate *_startDate;
    >
    > #define USE_NSOPERATIONS 1
    > // #define USE_NSTHREADS 1
    >
    > - (IBAction)createEntities:(id)sender
    > {
    > _startDate = [[NSDate date] retain];
    >
    > _operationQueue = [[NSOperationQueue alloc] init];
    > NSUInteger j = 0;
    > NSUInteger maxCores = [[NSProcessInfo processInfo]
    > activeProcessorCount];
    > NSMutableArray* keyQueues = [[NSMutableArray alloc] init];
    > for(NSString* key in s_importFileSet) {
    > [keyQueues addObject:key];
    > }
    > #if USE_NSOPERATIONS
    > for (j = 0; j < maxCores; j++) {
    > NSOperation* op = [[NSInvocationOperation alloc]
    > initWithTarget:self selector:@selector(processFilesForKeys:)
    > object:keyQueues];
    > [_operationQueue addOperation:op];
    > [op release];
    > }
    > #elif USE_NSTHREADS
    > _condition = [[NSCondition alloc] init];
    > _notFinished = maxCores;
    > OSMemoryBarrier();
    > for (j = 0; j < maxCores; j++) {
    > [NSThread detachNewThreadSelector:@selector
    > (processFilesForKeys:) toTarget:self withObject:keyQueues];
    > }
    > #else
    > for (j = 0; j < maxCores; j++) {
    > [self processFilesForKeys:keyQueues[j]];
    > }
    > #endif
    > [NSThread detachNewThreadSelector:@selector
    > (finishImportOperation:) toTarget:self withObject:keyQueues];
    > }
    >
    > - (void)finishImportOperation:(id)keys {
    > NSAutoreleasePool* pool = [[NSAutoreleasePool alloc] init];
    > #if USE_NSOPERATIONS
    > [_operationQueue waitUntilAllOperationsAreFinished];
    > #elif USE_NSTHREADS
    > [_condition lock];
    > while (_notFinished > 0) {
    > [_condition wait];
    > }
    > [_condition unlock];
    > #else
    > #endif
    > [keys release];
    > [_operationQueue release];
    > _operationQueue = nil;
    > NSLog(@"Total create time %f", [[NSDate date]
    > timeIntervalSinceDate:_startDate] );
    > [_startDate release];
    > [pool drain];
    > }
    >
    > - (void)processFilesForKeys:(NSMutableArray*)importKeys {
    > NSAutoreleasePool* pool = [[NSAutoreleasePool alloc] init];
    > DataDumpImporter* importer = [[DataDumpImporter alloc] init];
    >
    > NSPersistentStoreCoordinator* mainPSC = [[appDelegate
    > managedObjectContext] persistentStoreCoordinator];
    > NSPersistentStoreCoordinator* psc =
    > [[NSPersistentStoreCoordinator alloc] initWithManagedObjectModel:
    > [mainPSC managedObjectModel]];
    > [psc addPersistentStoreWithType:NSSQLiteStoreType
    > configuration:nil URL:[[[mainPSC persistentStores] lastObject] URL]
    > options:[NSDictionary dictionaryWithObject:[NSDictionary
    > dictionaryWithObject:@"0" forKey:@"synchronous"]
    > forKey:NSSQLitePragmasOption] error:nil];
    > // we disable synchronous because if an import fails, we can
    > delete the file and re-import.
    > // if you can't just delete the file, don't do this
    >
    > NSManagedObjectContext *moc = [[NSManagedObjectContext alloc] init];
    > [moc setPersistentStoreCoordinator:psc];
    > [psc release];
    > [importer setImportPath:[self importPath]];
    > [importer setMoc:moc];
    >
    > [moc setUndoManager:nil];
    >
    > while (1) {
    > NSString* key = nil;
    > OSSpinLockLock(&_queueLock);
    > key = [importKeys lastObject];
    > if (key) {
    > [importKeys removeLastObject];
    > }
    > OSSpinLockUnlock(&_queueLock);
    > if (!key) {
    > break;
    > }
    > @try {
    > DataDumpImporterParams *params =
    > [s_entityImporterParams objectForKey:key];
    > [importer importFile:[params filename] usingEntity:key
    > andFlags:[params flags]];
    > } @catch (id e) {
    > NSLog(@"e = %@", e);
    > }
    > }
    > [importer release];
    > [moc release];
    > [pool drain];
    > OSAtomicDecrement32Barrier(&_notFinished);
    > [_condition lock];
    > [_condition signal];
    > [_condition unlock];
    > }
    >
    > --
    >
    > -Ben
previous month january 2008 next month
MTWTFSS
  1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31      
Go to today