CoreData fetch is way too slow

  • I have a CoreData app that has a SQLite store holding 157,273 managed
    objects.  Each managed object has 2 attributes, input and answer,
    which are NSStrings and that's all.  No other attributes.  No
    relationships.  No fetched properties.

    A simple fetch for a single managed object takes 7 seconds!

        NSPredicate *predicate = [NSPredicate predicateWithFormat:@"%K
    like %@",
            @"input", input]; //assume NSString *input exists
        NSFetchRequest *fetchRequest = [[[NSFetchRequest alloc] init]
    autorelease];
        NSEntityDescription *entity = [NSEntityDescription
    entityForName:@"MyEntity"
                                                  inManagedObjectContext:
    [self managedObjectContext]];
        [fetchRequest setEntity:entity];
        [fetchRequest setPredicate:predicate];
        [fetchRequest setFetchLimit:1];
        NSError *error = nil;

        //The following line takes 7 seconds to execute:
        NSArray *results = [[self managedObjectContext]
    executeFetchRequest:fetchRequest error:&error];

        if ([results count]) {return [[results objectAtIndex:0]
    valueForKey:@"answer"];}
        return @"";

    How can I speed this up?

    The 157,273 inputs are unique.  I believe CoreData searches them
    linearly in no particular order, instead of sorting them so they can
    be binary searched in the store.  Is this aspect of CoreData opaque?
    Is there no way to get CoreData to efficiently search on a unique
    string attribute?

    If my experiments with CoreData continue to prove unfruitful, I may
    remain with my current implementation which uses Objective-C++  and
    an STL map.  I can load all the data into the map in about 3.6
    seconds and then retrieval from memory is near instantaneous.

    Thank you for any help speeding up this CoreData implementation or
    any other suggestions.
  • We've found fetches to be inherently slow too. Generally though we're
    not managing 150,000+ objects! Every time you do a fetch it involves a
    round-trip to the persistent store. You might want to try using an
    NSArrayController that will manage its own content, set it to manage
    your MyEntity objects, and then use an NSPredicate on the results of
    [arrayController arrangedObjects]; to filter it. We found that sped
    things up immensely for our commonly used managed objects, and we
    eliminated a lot of pointless round-trips to disk.

    -Mike

    On Feb 5, 2008, at 2:31 PM, James Hober wrote:

    > I have a CoreData app that has a SQLite store holding 157,273
    > managed objects.  Each managed object has 2 attributes, input and
    > answer, which are NSStrings and that's all.  No other attributes.
    > No relationships.  No fetched properties.
    >
    > A simple fetch for a single managed object takes 7 seconds!
    >
    > NSPredicate *predicate = [NSPredicate predicateWithFormat:@"%K
    > like %@",
    > @"input", input]; //assume NSString *input exists
    > NSFetchRequest *fetchRequest = [[[NSFetchRequest alloc] init]
    > autorelease];
    > NSEntityDescription *entity = [NSEntityDescription
    > entityForName:@"MyEntity"
    > inManagedObjectContext:
    > [self managedObjectContext]];
    > [fetchRequest setEntity:entity];
    > [fetchRequest setPredicate:predicate];
    > [fetchRequest setFetchLimit:1];
    > NSError *error = nil;
    >
    > //The following line takes 7 seconds to execute:
    > NSArray *results = [[self managedObjectContext]
    > executeFetchRequest:fetchRequest error:&error];
    >
    > if ([results count]) {return [[results objectAtIndex:0]
    > valueForKey:@"answer"];}
    > return @"";
    >
    > How can I speed this up?
    >
    > The 157,273 inputs are unique.  I believe CoreData searches them
    > linearly in no particular order, instead of sorting them so they can
    > be binary searched in the store.  Is this aspect of CoreData
    > opaque?  Is there no way to get CoreData to efficiently search on a
    > unique string attribute?
    >
    > If my experiments with CoreData continue to prove unfruitful, I may
    > remain with my current implementation which uses Objective-C++  and
    > an STL map.  I can load all the data into the map in about 3.6
    > seconds and then retrieval from memory is near instantaneous.
    >
    > Thank you for any help speeding up this CoreData implementation or
    > any other suggestions.

    Mike Glass
    Developer
    Marware, Inc.

    <mikeg...>
    http://www.projectx.com
    http://www.marware.com
  • On Feb 5, 2008, at 2:31 PM, James Hober wrote:

    > I have a CoreData app that has a SQLite store holding 157,273
    > managed objects.  Each managed object has 2 attributes, input and
    > answer, which are NSStrings and that's all.  No other attributes.
    > No relationships.  No fetched properties.
    >
    > A simple fetch for a single managed object takes 7 seconds!
    >
    > NSPredicate *predicate = [NSPredicate predicateWithFormat:@"%K
    > like %@",
    > @"input", input]; //assume NSString *input exists

    Two things come to mind:
    First, make sure that 'input' is indexed.
    Second, be careful of how/when you use 'like' in your predicates as it
    is pretty easy to take a huge performance hit (this is true even when
    using straight SQL) as you can have a clause that requests "input like
    '%sometext'" which can't make use of an index and you'll force a full
    table scan.

    I found Ben's response to one of my posts a few weeks back quite
    helpful: http://www.cocoabuilder.com/archive/message/cocoa/2008/1/16/196552

    Phil
  • Hi James,

    Though I don't know for sure it sounds to me like you don't have an
    index on the 'input' field. If I remember correctly there is checkbox
    on the field in the model editor that says 'indexed' and if you turn
    that on the SQLite store will put an index on that field which will
    make your fetches much much faster.

    There could be issues with the 'like' part of your query even after
    you make an index, but i'm not hip enough on SQLite to say one way or
    the other.

    http://www.sqlite.org/optoverview.html

    has some guidelines to optimizing things.

    HTH,

    -bd-
    http://bill.dudney.net/roller/objc

    On Feb 5, 2008, at 12:31 PM, James Hober wrote:

    > I have a CoreData app that has a SQLite store holding 157,273
    > managed objects.  Each managed object has 2 attributes, input and
    > answer, which are NSStrings and that's all.  No other attributes.
    > No relationships.  No fetched properties.
    >
    > A simple fetch for a single managed object takes 7 seconds!
    >
    > NSPredicate *predicate = [NSPredicate predicateWithFormat:@"%K
    > like %@",
    > @"input", input]; //assume NSString *input exists
    > NSFetchRequest *fetchRequest = [[[NSFetchRequest alloc] init]
    > autorelease];
    > NSEntityDescription *entity = [NSEntityDescription
    > entityForName:@"MyEntity"
    > inManagedObjectContext:
    > [self managedObjectContext]];
    > [fetchRequest setEntity:entity];
    > [fetchRequest setPredicate:predicate];
    > [fetchRequest setFetchLimit:1];
    > NSError *error = nil;
    >
    > //The following line takes 7 seconds to execute:
    > NSArray *results = [[self managedObjectContext]
    > executeFetchRequest:fetchRequest error:&error];
    >
    > if ([results count]) {return [[results objectAtIndex:0]
    > valueForKey:@"answer"];}
    > return @"";
    >
    > How can I speed this up?
    >
    > The 157,273 inputs are unique.  I believe CoreData searches them
    > linearly in no particular order, instead of sorting them so they can
    > be binary searched in the store.  Is this aspect of CoreData
    > opaque?  Is there no way to get CoreData to efficiently search on a
    > unique string attribute?
    >
    > If my experiments with CoreData continue to prove unfruitful, I may
    > remain with my current implementation which uses Objective-C++  and
    > an STL map.  I can load all the data into the map in about 3.6
    > seconds and then retrieval from memory is near instantaneous.
    >
    > Thank you for any help speeding up this CoreData implementation or
    > any other suggestions.
  • James, Mike,

    The tools on Leopard should help identify your performance problems.
    Shark is my perennial favorite, but with the Core Data template,
    Instruments is very handy.

    For quick spot checking, you can use the user defaults:
    -com.apple.CoreData.SyntaxColoredLogging 1 -com.apple.CoreData.SQLDebug 1

    For reference, if I search a 4+ million row database with no
    predicate and a limit of 1 million, sql logging shows:

    2008-02-05 14:48:27.973 CoreDataFetchTool[6916:10b] CoreData: sql:
    SELECT 0, t0.Z_PK, t0.Z_OPT, t0.ZLETTER, t0.ZTEXT, t0.ZLENGTH FROM
    ZWORD t0  LIMIT 1000000
    2008-02-05 14:48:28.818 CoreDataFetchTool[6916:10b] CoreData:
    annotation: sql connection fetch time: 0.8225s
    2008-02-05 14:48:29.029 CoreDataFetchTool[6916:10b] CoreData:
    annotation: total fetch execution time: 1.0554s for 1000000 rows.
    Fetched 1000000 items at 913297 rows per second.

    and for a limit of 1:

    2008-02-05 14:48:40.764 CoreDataFetchTool[6926:10b] CoreData: sql:
    SELECT Z_VERSION, Z_UUID, Z_PLIST FROM Z_METADATA
    2008-02-05 14:48:40.767 CoreDataFetchTool[6926:10b] CoreData: sql:
    SELECT 0, t0.Z_PK, t0.Z_OPT, t0.ZLETTER, t0.ZTEXT, t0.ZLENGTH FROM
    ZWORD t0  LIMIT 1
    2008-02-05 14:48:40.768 CoreDataFetchTool[6926:10b] CoreData:
    annotation: sql connection fetch time: 0.0007s
    2008-02-05 14:48:40.768 CoreDataFetchTool[6926:10b] CoreData:
    annotation: total fetch execution time: 0.0013s for 1 rows.
    Fetched 1 items at 67 rows per second.

    As your fetch for 1 object is 5,000x slower than no predicate (on a
    much larger database, no less), I think we can hazard a guess that
    your predicate involves significant complexity.

    Thanks to Phil for handy reference to an earlier post describing how
    indexing works with Core Data at
    <http://www.cocoabuilder.com/archive/message/cocoa/2008/1/16/196552>

    An arbitrary LIKE query is extremely expensive.  If done correctly
    with locale and Unicode support, it will be slow no matter what
    storage technology you adopt.  Databases typically do not support
    indices for pattern matching queries.  So this query is doing a full
    table scan.  That's what a full text search engine like Lucene,
    SearchKit or Spotlight is for (although many databases now offer some
    integration with FTS).

    There's only one way to make performing a locale & unicode friendly
    arbitrary LIKE operations on every row in a table faster:  Don't.

    Many developers write arbitrary LIKE queries when all they really
    need are prefix or suffix matching.  You can use a beginsWith or
    endsWith predicate instead.  As those predicates still support
    unicode matching, they're only somewhat faster than a full LIKE.

    To really speed things up, you need to go with locale & unicode
    hostile searching.  If you use the arithmetic predicate operators
    (=, >, <, etc) Core Data will treat the column as binary data, ditch
    all the unicode handling, and make your search eligible for indices.

    This is typically done by denormalizing the data and having "display
    text" and "search text" columns.  Depending on the fidelity your
    application requires, you can also do "fast lossy query on search
    text" AND "slow unicode query on display text" which will at least
    prune the number of rows evaluated by the slow query.

    A prefix search can be rewritten as "attribute >= x and attribute <
    y"  A suffix search can be rewritten similarly using a column with
    the characters reversed.

    When you supplement this approach with an index (binary collation) to
    the attribute, you should be able to get this ~100x faster.

    It's possible that your application's usage patterns work better
    loading everything into memory up front.  You can do that with Core
    Data too.

    re: Mike's caching commonly used managed objects.

    That's what we expect done.  Core Data is not a big dictionary.
    Foundation and Core Foundation already have a number of perfectly
    good collection APIs, and you can cache in an intelligent,
    application specific manner.  And NSArray and NSSet support direct
    filtering with predicates.  It's not clear what would make things
    easier for you.

    If you have a problem you think is generally applicable, please file
    a feature request.  Core Data does not assume that your process is
    the only process using the database.  But if enough developers wanted
    an option to facilitate that assumption, we could do that.

    If you file performance bugs, you can really help us understand what
    queries are most in need of improvement.  If you include a sample
    project (source) with a representative data set, we can make a
    detailed investigation.  Martin (from previous thread cited above)
    was kind enough to do so, and those issues inspired some performance
    improvements that will benefit everyone.
    --

    -Ben
  • I simply changed the predicate from using LIKE to using == and fetch
    time plummeted from 7 seconds to 0.18 seconds.

    This is very usable.  I think it is a better user experience to shave
    off the 3.5 seconds of launch time that my previous implementation
    needed in exchange for each query taking 0.18 seconds.

    Several of you suggested indexing the attribute I search against.  It
    looks like indexing an attribute is only available in Leopard and for
    now I have to support Tiger.

    I will continue to investigate further optimization.

    Thanks to all who replied.

    James