CoreData model optimization? (slow on insert with fetches)

  • Hello,
    I need to represent a schema for a messages archive program. Each
    message, as like a nug have an unique msgid and some attributes as
    Subject, Author, Posting date etc.
    I've tried to do the schema illustrated below where author and subject
    are separated entities and there are three kind of message (the root
    is a simple article, then a root message that not contains a parent
    but could contains childrens, and finally a child node that can have
    both).

    http://img408.imageshack.us/img408/6235/immagine1re6.png

    Why?
    I've tried to optimize the coredata engine because it seems to be very
    slow when there are a large set of data (ideally i could have about
    300 threads and 20,000 childs).

    So I've created an entity rootnode because root nodes are few and it
    could take less than scanning inside a list of simple childs only
    roots.
    I've also separated author and subject in order to prevent duplicate
    texts (generally childs have the same subject of parent and an author
    wrote lots of messages).

    Unfortunatly it can't work well.
    I need to test if article->root/child classes is a good idea but the
    separate entities for subject and author are not good. Why? Because If
    i try to insert 10,000 test data, Core Data need to search for an
    existing author/subject and then it's need to associate it to the
    message. It takes lots of time, about 30-40 seconds!

    Any tips to optimize the model or any suggestions. I think I'm wrong
    because Mail uses CoreData and it's seems to be good as speed.
    Thank you a lot
  • > Why?

      Missing information: What specifically is slow? What are you
    fetching that is slow? How ***EXACTLY*** are you fetching it? Are you
    sure you're not doing something to cause it to be fetched multiple
    times? These details directly affect performance and you've left them
    all out for some reason.

      Suggestions: Read back over the Core Data Programming Guide and the
    Key-Value Coding Programming Guide. The former so you can better
    understand how things work and the latter so you can name things a
    little better (your attribute names are bizarre).

      Regarding overall design of your model, Mail deals with "Message".
    A message may belong to a thread (and may be sorted by thread) but a
    message is still a message. I would think separating out "Subject" as
    a separate entity would only cost you more work, not less.

      How about you describe the *basic*, plain-English model you're
    dealing with (rather than the "optimized" version you posted)? It'd
    make it easier to make suggestions.

    --
    I.S.
  • On 02 Oct 07, at 07:47, Hell's KItchen wrote:
    <snip>
    > Any tips to optimize the model or any suggestions. I think I'm wrong
    > because Mail uses CoreData and it's seems to be good as speed.

    Correction: Mail does not use CoreData - at least, not in any
    traditional fashion. It does use an SQLite database ("Envelope
    Index"), but it only appears to be used as a cache. Deleting the
    database doesn't cause any data to be lost.

    Regarding your database design: I'm unsure of the purpose of the
    "author" and "subject" entities. The storage overhead of storing this
    data redundantly isn't likely to be all that high, and the processing
    overhead of looking up the correct entities on insertion is
    significant. You're probably better off just storing this data within
    the parent entity and living with the fact that there's some
    duplication.
previous month october 2007 next month
MTWTFSS
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31        
Go to today