CoreData fetch is way too slow
-
I have a CoreData app that has a SQLite store holding 157,273 managed
objects. Each managed object has 2 attributes, input and answer,
which are NSStrings and that's all. No other attributes. No
relationships. No fetched properties.
A simple fetch for a single managed object takes 7 seconds!
NSPredicate *predicate = [NSPredicate predicateWithFormat:@"%K
like %@",
@"input", input]; //assume NSString *input exists
NSFetchRequest *fetchRequest = [[[NSFetchRequest alloc] init]
autorelease];
NSEntityDescription *entity = [NSEntityDescription
entityForName:@"MyEntity"
inManagedObjectContext:
[self managedObjectContext]];
[fetchRequest setEntity:entity];
[fetchRequest setPredicate:predicate];
[fetchRequest setFetchLimit:1];
NSError *error = nil;
//The following line takes 7 seconds to execute:
NSArray *results = [[self managedObjectContext]
executeFetchRequest:fetchRequest error:&error];
if ([results count]) {return [[results objectAtIndex:0]
valueForKey:@"answer"];}
return @"";
How can I speed this up?
The 157,273 inputs are unique. I believe CoreData searches them
linearly in no particular order, instead of sorting them so they can
be binary searched in the store. Is this aspect of CoreData opaque?
Is there no way to get CoreData to efficiently search on a unique
string attribute?
If my experiments with CoreData continue to prove unfruitful, I may
remain with my current implementation which uses Objective-C++ and
an STL map. I can load all the data into the map in about 3.6
seconds and then retrieval from memory is near instantaneous.
Thank you for any help speeding up this CoreData implementation or
any other suggestions. -
We've found fetches to be inherently slow too. Generally though we're
not managing 150,000+ objects! Every time you do a fetch it involves a
round-trip to the persistent store. You might want to try using an
NSArrayController that will manage its own content, set it to manage
your MyEntity objects, and then use an NSPredicate on the results of
[arrayController arrangedObjects]; to filter it. We found that sped
things up immensely for our commonly used managed objects, and we
eliminated a lot of pointless round-trips to disk.
-Mike
On Feb 5, 2008, at 2:31 PM, James Hober wrote:
> I have a CoreData app that has a SQLite store holding 157,273
> managed objects. Each managed object has 2 attributes, input and
> answer, which are NSStrings and that's all. No other attributes.
> No relationships. No fetched properties.
>
> A simple fetch for a single managed object takes 7 seconds!
>
> NSPredicate *predicate = [NSPredicate predicateWithFormat:@"%K
> like %@",
> @"input", input]; //assume NSString *input exists
> NSFetchRequest *fetchRequest = [[[NSFetchRequest alloc] init]
> autorelease];
> NSEntityDescription *entity = [NSEntityDescription
> entityForName:@"MyEntity"
> inManagedObjectContext:
> [self managedObjectContext]];
> [fetchRequest setEntity:entity];
> [fetchRequest setPredicate:predicate];
> [fetchRequest setFetchLimit:1];
> NSError *error = nil;
>
> //The following line takes 7 seconds to execute:
> NSArray *results = [[self managedObjectContext]
> executeFetchRequest:fetchRequest error:&error];
>
> if ([results count]) {return [[results objectAtIndex:0]
> valueForKey:@"answer"];}
> return @"";
>
> How can I speed this up?
>
> The 157,273 inputs are unique. I believe CoreData searches them
> linearly in no particular order, instead of sorting them so they can
> be binary searched in the store. Is this aspect of CoreData
> opaque? Is there no way to get CoreData to efficiently search on a
> unique string attribute?
>
> If my experiments with CoreData continue to prove unfruitful, I may
> remain with my current implementation which uses Objective-C++ and
> an STL map. I can load all the data into the map in about 3.6
> seconds and then retrieval from memory is near instantaneous.
>
> Thank you for any help speeding up this CoreData implementation or
> any other suggestions.
Mike Glass
Developer
Marware, Inc.
<mikeg...>
http://www.projectx.com
http://www.marware.com -
On Feb 5, 2008, at 2:31 PM, James Hober wrote:
> I have a CoreData app that has a SQLite store holding 157,273
> managed objects. Each managed object has 2 attributes, input and
> answer, which are NSStrings and that's all. No other attributes.
> No relationships. No fetched properties.
>
> A simple fetch for a single managed object takes 7 seconds!
>
> NSPredicate *predicate = [NSPredicate predicateWithFormat:@"%K
> like %@",
> @"input", input]; //assume NSString *input exists
Two things come to mind:
First, make sure that 'input' is indexed.
Second, be careful of how/when you use 'like' in your predicates as it
is pretty easy to take a huge performance hit (this is true even when
using straight SQL) as you can have a clause that requests "input like
'%sometext'" which can't make use of an index and you'll force a full
table scan.
I found Ben's response to one of my posts a few weeks back quite
helpful: http://www.cocoabuilder.com/archive/message/cocoa/2008/1/16/196552
Phil -
Hi James,
Though I don't know for sure it sounds to me like you don't have an
index on the 'input' field. If I remember correctly there is checkbox
on the field in the model editor that says 'indexed' and if you turn
that on the SQLite store will put an index on that field which will
make your fetches much much faster.
There could be issues with the 'like' part of your query even after
you make an index, but i'm not hip enough on SQLite to say one way or
the other.
http://www.sqlite.org/optoverview.html
has some guidelines to optimizing things.
HTH,
-bd-
http://bill.dudney.net/roller/objc
On Feb 5, 2008, at 12:31 PM, James Hober wrote:
> I have a CoreData app that has a SQLite store holding 157,273
> managed objects. Each managed object has 2 attributes, input and
> answer, which are NSStrings and that's all. No other attributes.
> No relationships. No fetched properties.
>
> A simple fetch for a single managed object takes 7 seconds!
>
> NSPredicate *predicate = [NSPredicate predicateWithFormat:@"%K
> like %@",
> @"input", input]; //assume NSString *input exists
> NSFetchRequest *fetchRequest = [[[NSFetchRequest alloc] init]
> autorelease];
> NSEntityDescription *entity = [NSEntityDescription
> entityForName:@"MyEntity"
> inManagedObjectContext:
> [self managedObjectContext]];
> [fetchRequest setEntity:entity];
> [fetchRequest setPredicate:predicate];
> [fetchRequest setFetchLimit:1];
> NSError *error = nil;
>
> //The following line takes 7 seconds to execute:
> NSArray *results = [[self managedObjectContext]
> executeFetchRequest:fetchRequest error:&error];
>
> if ([results count]) {return [[results objectAtIndex:0]
> valueForKey:@"answer"];}
> return @"";
>
> How can I speed this up?
>
> The 157,273 inputs are unique. I believe CoreData searches them
> linearly in no particular order, instead of sorting them so they can
> be binary searched in the store. Is this aspect of CoreData
> opaque? Is there no way to get CoreData to efficiently search on a
> unique string attribute?
>
> If my experiments with CoreData continue to prove unfruitful, I may
> remain with my current implementation which uses Objective-C++ and
> an STL map. I can load all the data into the map in about 3.6
> seconds and then retrieval from memory is near instantaneous.
>
> Thank you for any help speeding up this CoreData implementation or
> any other suggestions.
-
James, Mike,
The tools on Leopard should help identify your performance problems.
Shark is my perennial favorite, but with the Core Data template,
Instruments is very handy.
For quick spot checking, you can use the user defaults:
-com.apple.CoreData.SyntaxColoredLogging 1 -com.apple.CoreData.SQLDebug 1
For reference, if I search a 4+ million row database with no
predicate and a limit of 1 million, sql logging shows:
2008-02-05 14:48:27.973 CoreDataFetchTool[6916:10b] CoreData: sql:
SELECT 0, t0.Z_PK, t0.Z_OPT, t0.ZLETTER, t0.ZTEXT, t0.ZLENGTH FROM
ZWORD t0 LIMIT 1000000
2008-02-05 14:48:28.818 CoreDataFetchTool[6916:10b] CoreData:
annotation: sql connection fetch time: 0.8225s
2008-02-05 14:48:29.029 CoreDataFetchTool[6916:10b] CoreData:
annotation: total fetch execution time: 1.0554s for 1000000 rows.
Fetched 1000000 items at 913297 rows per second.
and for a limit of 1:
2008-02-05 14:48:40.764 CoreDataFetchTool[6926:10b] CoreData: sql:
SELECT Z_VERSION, Z_UUID, Z_PLIST FROM Z_METADATA
2008-02-05 14:48:40.767 CoreDataFetchTool[6926:10b] CoreData: sql:
SELECT 0, t0.Z_PK, t0.Z_OPT, t0.ZLETTER, t0.ZTEXT, t0.ZLENGTH FROM
ZWORD t0 LIMIT 1
2008-02-05 14:48:40.768 CoreDataFetchTool[6926:10b] CoreData:
annotation: sql connection fetch time: 0.0007s
2008-02-05 14:48:40.768 CoreDataFetchTool[6926:10b] CoreData:
annotation: total fetch execution time: 0.0013s for 1 rows.
Fetched 1 items at 67 rows per second.
As your fetch for 1 object is 5,000x slower than no predicate (on a
much larger database, no less), I think we can hazard a guess that
your predicate involves significant complexity.
Thanks to Phil for handy reference to an earlier post describing how
indexing works with Core Data at
<http://www.cocoabuilder.com/archive/message/cocoa/2008/1/16/196552>
An arbitrary LIKE query is extremely expensive. If done correctly
with locale and Unicode support, it will be slow no matter what
storage technology you adopt. Databases typically do not support
indices for pattern matching queries. So this query is doing a full
table scan. That's what a full text search engine like Lucene,
SearchKit or Spotlight is for (although many databases now offer some
integration with FTS).
There's only one way to make performing a locale & unicode friendly
arbitrary LIKE operations on every row in a table faster: Don't.
Many developers write arbitrary LIKE queries when all they really
need are prefix or suffix matching. You can use a beginsWith or
endsWith predicate instead. As those predicates still support
unicode matching, they're only somewhat faster than a full LIKE.
To really speed things up, you need to go with locale & unicode
hostile searching. If you use the arithmetic predicate operators
(=, >, <, etc) Core Data will treat the column as binary data, ditch
all the unicode handling, and make your search eligible for indices.
This is typically done by denormalizing the data and having "display
text" and "search text" columns. Depending on the fidelity your
application requires, you can also do "fast lossy query on search
text" AND "slow unicode query on display text" which will at least
prune the number of rows evaluated by the slow query.
A prefix search can be rewritten as "attribute >= x and attribute <
y" A suffix search can be rewritten similarly using a column with
the characters reversed.
When you supplement this approach with an index (binary collation) to
the attribute, you should be able to get this ~100x faster.
It's possible that your application's usage patterns work better
loading everything into memory up front. You can do that with Core
Data too.
re: Mike's caching commonly used managed objects.
That's what we expect done. Core Data is not a big dictionary.
Foundation and Core Foundation already have a number of perfectly
good collection APIs, and you can cache in an intelligent,
application specific manner. And NSArray and NSSet support direct
filtering with predicates. It's not clear what would make things
easier for you.
If you have a problem you think is generally applicable, please file
a feature request. Core Data does not assume that your process is
the only process using the database. But if enough developers wanted
an option to facilitate that assumption, we could do that.
If you file performance bugs, you can really help us understand what
queries are most in need of improvement. If you include a sample
project (source) with a representative data set, we can make a
detailed investigation. Martin (from previous thread cited above)
was kind enough to do so, and those issues inspired some performance
improvements that will benefit everyone.
--
-Ben -
I simply changed the predicate from using LIKE to using == and fetch
time plummeted from 7 seconds to 0.18 seconds.
This is very usable. I think it is a better user experience to shave
off the 3.5 seconds of launch time that my previous implementation
needed in exchange for each query taking 0.18 seconds.
Several of you suggested indexing the attribute I search against. It
looks like indexing an attribute is only available in Leopard and for
now I have to support Tiger.
I will continue to investigate further optimization.
Thanks to all who replied.
James



