[Core Data] Lightweight Objects

  • Hi,

    Hopefully this is a simple question:

    I have a core data application in mind, and I'm wondering what the best way
    to organise my model would be. The bit that has me concerned is that I have
    an entity (let's call it "aLog") that has associated with it an ordered set
    of 1 byte values that's variable in number. They are data samples from an
    device which captures data. There's no real limit on how many samples might
    exist in a log.

    So, I could have:

    aLog ----to many---> aSample

    But that seems very wasteful, especially as the sample will probably need an
    Id number to keep sequence and possibly a reverse relation to say which log
    a sample belongs to. Multiply up by thousands of samples and that seems
    silly.

    So I'm thinking it would be nice to store the samples in a property of the
    "aLog" entity. A String with no max length? A BLOB?

    Any suggestions?

    Paul
  • On $BJ?@.(B 19/11/10, at 19:02, Paul Sargent wrote:

    > Hi,
    >
    > Hopefully this is a simple question:
    >
    > I have a core data application in mind, and I'm wondering what the
    > best way
    > to organise my model would be. The bit that has me concerned is that
    > I have
    > an entity (let's call it "aLog") that has associated with it an
    > ordered set
    > of 1 byte values that's variable in number. They are data samples
    > from an
    > device which captures data. There's no real limit on how many
    > samples might
    > exist in a log.
    >
    > So, I could have:
    >
    > aLog ----to many---> aSample
    Yes IMO thats good, factor out binary data. It actually ends up being
    faster in the long run. Thats assuming that those 1byte values are
    going to be longer than just.... one byte. Right?

    > But that seems very wasteful, especially as the sample will probably
    > need an
    > Id number to keep sequence and possibly a reverse relation to say
    > which log
    > a sample belongs to. Multiply up by thousands of samples and that
    > seems
    > silly.
    Have you seen the benchmarks on CoreData on Leopard? They are wicked
    fast.
    I wouldn't be worried.

    >
    > So I'm thinking it would be nice to store the samples in a property
    > of the
    > "aLog" entity. A String with no max length? A BLOB?
    Arrays of bytes, should be BLOBs no?

    >
    > Any suggestions?
    Your original idea, I think is better.
    Lets say, for some analysis you want to do later, you want to grab all
    samples, but you don't need the log information... well, how do you do
    that efficiently if they are the same entity? (This is assuming the
    aLog entity ends up having lots of fields populating it)
    Having an inverse relation lets you get that info, only when you need
    it, like say there was an anomaly in the sample... its best to factor
    out A) data that will be accessed very very often, and B) BLOBs.

    Thats just my opinion though!

    Andre

    >
    > Paul
  • Hi Andre,

    Thanks for your input.

    On 10 Nov 2007, at 10:37, <listposter...> wrote:

    >
    > On $BJ?@.(B 19/11/10, at 19:02, Paul Sargent wrote:
    >
    >> So, I could have:
    >>
    >> aLog ----to many---> aSample
    > Yes IMO thats good, factor out binary data. It actually ends up
    > being faster in the long run. Thats assuming that those 1byte values
    > are going to be longer than just.... one byte. Right?

    Well... No, the one byte values are just one byte. That's all that's
    logged, that's all I need to store. Hence the question.

    To give a bit of context this is a 'viewer/library' type app, for data
    that logged by an embedded device. So it's going to be drawing graphs
    of this data typically. The reason the samples are just a single byte
    each is because the memory on the device is limited (16Kb).

    > Have you seen the benchmarks on CoreData on Leopard? They are wicked
    > fast.
    > I wouldn't be worried.

    I haven't actually. It would be interesting to see numbers, but can't
    find any myself.

    > Lets say, for some analysis you want to do later, you want to grab
    > all samples, but you don't need the log information... well, how do
    > you do that efficiently if they are the same entity? (This is
    > assuming the aLog entity ends up having lots of fields populating it)

    I'm not seeing myself doing that type of analysis, but I agree, it
    makes sense if you are doing that kind of thing.

    ...and Yes, the log has a lot more information in it that more suited
    to core data. A lot of who, what, where type stuff that naturally fits
    into standard field types.

    It's just this one property of the log. It just seems wrong to be
    creating so many objects (up to 16 thousand per log),  If I can store
    the data in a single 16Kb BLOB, that's got to be more efficient than
    16K objects, each being 10+ times the size.

    Especially if you're talking about retrieving 16K objects to draw a
    single graph.

    >> Any suggestions?
    > Your original idea, I think is better.
    >
    > Thats just my opinion though!

    Fair enough. I have a feeling this might be the sort of thing that I
    have to try both ways and see how it does.

    Paul
  • On $BJ?@.(B 19/11/11, at 0:21, Paul Sargent wrote:

    > Hi Andre,
    >
    > Thanks for your input.
    >
    > On 10 Nov 2007, at 10:37, <listposter...> wrote:
    >
    >>
    >> On $BJ?@.(B 19/11/10, at 19:02, Paul Sargent wrote:
    >>
    >>> So, I could have:
    >>>
    >>> aLog ----to many---> aSample
    >> Yes IMO thats good, factor out binary data. It actually ends up
    >> being faster in the long run. Thats assuming that those 1byte
    >> values are going to be longer than just.... one byte. Right?
    >
    > Well... No, the one byte values are just one byte. That's all that's
    > logged, that's all I need to store. Hence the question.
    >
    > To give a bit of context this is a 'viewer/library' type app, for
    > data that logged by an embedded device. So it's going to be drawing
    > graphs of this data typically. The reason the samples are just a
    > single byte each is because the memory on the device is limited
    > (16Kb).
    >
    >> Have you seen the benchmarks on CoreData on Leopard? They are
    >> wicked fast.
    >> I wouldn't be worried.
    >
    > I haven't actually. It would be interesting to see numbers, but
    > can't find any myself.
    Well, last I saw, they had some at WWDC last year, but not sure they
    actually published... but they were impressive.

    >
    >> Lets say, for some analysis you want to do later, you want to grab
    >> all samples, but you don't need the log information... well, how do
    >> you do that efficiently if they are the same entity? (This is
    >> assuming the aLog entity ends up having lots of fields populating it)
    >
    > I'm not seeing myself doing that type of analysis, but I agree, it
    > makes sense if you are doing that kind of thing.
    >
    > ...and Yes, the log has a lot more information in it that more
    > suited to core data. A lot of who, what, where type stuff that
    > naturally fits into standard field types.
    >
    > It's just this one property of the log. It just seems wrong to be
    > creating so many objects (up to 16 thousand per log),  If I can
    > store the data in a single 16Kb BLOB, that's got to be more
    > efficient than 16K objects, each being 10+ times the size.
    Well, then 16kb is nothing. I would just put it in the log......  but,
    like has been said elsewhere.... Premature optimization... blalblabla
    been said a million times I know....

    > Especially if you're talking about retrieving 16K objects to draw a
    > single graph.

    Yea.

    >
    >>> Any suggestions?
    >> Your original idea, I think is better.
    >>
    >> Thats just my opinion though!
    >
    > Fair enough. I have a feeling this might be the sort of thing that I
    > have to try both ways and see how it does.
    Yea, try it out, but I think storing 16kb would be trivial..... it
    depends on I guess, how many samples your going to take verses how
    many logs your going to make.
    And it would be infinitely faster grabbing one log and iterating over
    16k of data in memory... so.. I think I changed my mind!  If you
    sample is just going to be a single byte......

    Andre

    >
    > Paul
  • > I have a core data application in mind, and I'm wondering what the
    > best way
    > to organise my model would be. The bit that has me concerned is that
    > I have
    > an entity (let's call it "aLog") that has associated with it an
    > ordered set
    > of 1 byte values that's variable in number. They are data samples
    > from an
    > device which captures data. There's no real limit on how many
    > samples might
    > exist in a log.
    >
    > So, I could have:
    >
    > aLog ----to many---> aSample
    >
    > But that seems very wasteful, especially as the sample will probably
    > need an
    > Id number to keep sequence and possibly a reverse relation to say
    > which log
    > a sample belongs to. Multiply up by thousands of samples and that
    > seems
    > silly.
    >
    > So I'm thinking it would be nice to store the samples in a property
    > of the
    > "aLog" entity. A String with no max length? A BLOB?

    Given the limit is 16K, your first impression about just having a BLOB
    that's an array of 1 byte values is probably best.

    Informing that choice is a number of factors.  Managed Objects are 48
    bytes, with a 16 byte unique identifier.  There is some caching, and
    space needed for various features, and that roughly brings the
    framework's per row overhead to 128 bytes.

    Now, that's overhead for objects actively being used in memory.
    Avoiding fetching data you don't need, and leaving that data
    serialized on disk, is always preferable.  This is basically analogous
    to graphics rendering.  You want to clip to the visible region.  Why
    draw stuff nobody can see ?  Same thing for working with lots of
    objects.  Clip the working set.  SQLite can, without any special
    effort, handle millions of rows.

    For especially fine grained objects, like your log data, it generally
    doesn't make much sense to burn 128 bytes to hold a 1 byte value.  You
    might want to if it was really important to be able to search on each
    individual element, possess inverse relationships, or independently
    update each element (with conflict detection, undo, etc), or bind them
    into the UI.

    The key features the database offers to distinct elements are
    searching and updates.  If you don't need them, you can cluster things
    together into coarser grained units.

    So for your data, or similar data like large collections of vertices,
    it's enough to update the whole array as one semantic value.  You can
    put them into a binary property, or use a transformable property on
    Leopard.

    If your log data could grow much larger, like megabytes, then you
    might consider instead storing a URI to a file in the database, and
    log directly into a file.  The filesystem is very good with buffered
    sequential operations.  But at 16K that's not terribly interesting
    (each file a minimum of 4K, with memory overhead to access running in
    the 16K range)

    - Ben
previous month november 2007 next month
MTWTFSS
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30    
Go to today