Storing NSDocument data in packages/bundles

  • Hello List,

      I'm thinking of switching from a single-file based document file
    storage to a package document with various files in it, I have a
    little problem with that though:

    My app stores a lot of data in a document, potentially hundreds of
    megabytes. The problem with using single files and NSKeyedArchiver is
    that one has to write everything, even if only a tiny part of the
    structure changed. My data is clustered into handy little pieces which
    could be saved into their own separate files within the package.

    The problem is that - if I understand it correctly - when using file
    wrapper -fileWrapperOfType: of NSDocument, I always have to provide
    the complete wrapper, including all contained wrappers. That of course
    is exactly what I hoped to avoid because it means I'd have to write
    all data (this time into separate files/wrappers).

    The NSBundle guide mentions another technique of using "traditional
    file system methods" for loading and saving when you have special
    needs. It's rather vague beyond that and I was hoping to get some
    hints as to how to tackle this problem.

    I wouldn't mind loading all the data at once. What I'm after is a
    mechanism that would allow me to update only parts of the package
    (adding, changing or deleting files as needed). Is there any chance I
    can use -fileWrapperOfType: for that?

    Looking at the NSDocument file saving message flow, it seems rather
    difficult to come up with something completely different. The document
    location you're saving to is not the final location of the real
    document on the disk. That of course prevents you from only writing
    parts of the document because the other stuff would be lost when your
    new version is moved to its final destination.

    So the question is, can I use NSDocument with an "update-relevant-
    parts-of-a-package-only" saving mechanism? Thanks for any pointers!

    Regards
    Markus
    --
  • On Wed, Dec 31, 2008 at 3:51 AM, Markus Spoettl
    <msapplelists...> wrote:
    > My app stores a lot of data in a document, potentially hundreds of
    > megabytes. The problem with using single files and NSKeyedArchiver is that
    > one has to write everything, even if only a tiny part of the structure
    > changed. My data is clustered into handy little pieces which could be saved
    > into their own separate files within the package.

    Have you thought of using Core Data instead?  This is what the SQLite
    store is designed to address.

    > The problem is that - if I understand it correctly - when using file wrapper
    > -fileWrapperOfType: of NSDocument, I always have to provide the complete
    > wrapper, including all contained wrappers. That of course is exactly what I
    > hoped to avoid because it means I'd have to write all data (this time into
    > separate files/wrappers).

    That is correct.  You have to implement -readFromURL:ofType:error: and
    -writeToURL:ofType:forSaveOperation:originalContentsURL:error: if you
    want to go the document-package route and not wind up with wholesale
    writing.  This means, of course, that you need to be able to tell what
    has changed in the document, and then be able to determine which
    fragments need to be updated on disk.

    --Kyle Sluder
  • On Dec 31, 2008, at 12:57 AM, Kyle Sluder wrote:
    > On Wed, Dec 31, 2008 at 3:51 AM, Markus Spoettl
    > <msapplelists...> wrote:
    >> My app stores a lot of data in a document, potentially hundreds of
    >> megabytes. The problem with using single files and NSKeyedArchiver
    >> is that
    >> one has to write everything, even if only a tiny part of the
    >> structure
    >> changed. My data is clustered into handy little pieces which could
    >> be saved
    >> into their own separate files within the package.
    >
    > Have you thought of using Core Data instead?  This is what the SQLite
    > store is designed to address.

    No I have not, but I have a feeling that it wouldn't be suitable. The
    data I store contains (when the document is big) millions of double
    values (amongst other things) spread across hundreds of thousands of
    objects. If the performance of NSKeyedArchiver is any indication the
    system wouldn't scale very well. It is an assumption and it might be
    totally wrong but I guess the overhead of keyed archiving is
    significantly less than that of Core Data.

    >> The problem is that - if I understand it correctly - when using
    >> file wrapper
    >> -fileWrapperOfType: of NSDocument, I always have to provide the
    >> complete
    >> wrapper, including all contained wrappers. That of course is
    >> exactly what I
    >> hoped to avoid because it means I'd have to write all data (this
    >> time into
    >> separate files/wrappers).
    >
    > That is correct.  You have to implement -readFromURL:ofType:error: and
    > -writeToURL:ofType:forSaveOperation:originalContentsURL:error: if you
    > want to go the document-package route and not wind up with wholesale
    > writing.  This means, of course, that you need to be able to tell what
    > has changed in the document, and then be able to determine which
    > fragments need to be updated on disk.

    That's no problem, that information is available. The documentation
    for -writeToURL:ofType:forSaveOperation:originalContentsURL:error:
    states:

    ----------
    The value of absoluteURL is often not the same as [self fileURL].
    Other times it is not the same as the URL for the final save
    destination. Likewise, absoluteOriginalContentsURL is often not the
    same value as [self fileURL].
    ----------

    which is a little problem because to update my packages I need to
    original location. Do you have any insights as to what "often not the
    same" in this context might mean? To write the diff into the package
    I'd have to have access to the package. It doesn't sounds as it that's
    guaranteed.

    Regards
    Markus
    --
  • On Wed, Dec 31, 2008 at 4:16 AM, Markus Spoettl
    <msapplelists...> wrote:
    > No I have not, but I have a feeling that it wouldn't be suitable. The data I
    > store contains (when the document is big) millions of double values (amongst
    > other things) spread across hundreds of thousands of objects. If the
    > performance of NSKeyedArchiver is any indication the system wouldn't scale
    > very well. It is an assumption and it might be totally wrong but I guess the
    > overhead of keyed archiving is significantly less than that of Core Data.

    Quite the contrary, I'm afraid.  Although without actually testing it,
    you can't know for sure.

    > That's no problem, that information is available. The documentation for
    > -writeToURL:ofType:forSaveOperation:originalContentsURL:error: states:
    >
    > ----------
    > The value of absoluteURL is often not the same as [self fileURL]. Other
    > times it is not the same as the URL for the final save destination.
    > Likewise, absoluteOriginalContentsURL is often not the same value as [self
    > fileURL].
    > ----------
    >
    > which is a little problem because to update my packages I need to original
    > location. Do you have any insights as to what "often not the same" in this
    > context might mean? To write the diff into the package I'd have to have
    > access to the package. It doesn't sounds as it that's guaranteed.

    For safe save operations, AppKit writes the data to a temporary file
    on the same volume, and then swaps the old file with the new, which is
    an atomic operation.  If it can't do that, it will rename the original
    file and write the new one with the old name.  This is why absoluteURL
    or absoluteOriginalContentsURL won't necessarily jive with -fileURL
    (see the comment header for -[NSDocument
    writeSafelyToURL:ofType:forSaveOperation:error:] for more details).
    The upshot is that absoluteOriginalContentsURL will be a URL which you
    can use to access your existing on-disk data, whether or not AppKit
    has temporarily renamed it.

    --Kyle Sluder
  • By the way, this means that you can't just write out a diff to the
    existing file on disk.  You must replace the entire file inside your
    document package.  This is what I had thought you would originally
    want to do; you are still performing wholesale writes, but not of the
    entire document, just the shards within the package whose contents are
    affected.
  • On Dec 31, 2008, at 01:32, Kyle Sluder wrote:

    > For safe save operations, AppKit writes the data to a temporary file
    > on the same volume, and then swaps the old file with the new, which is
    > an atomic operation.  If it can't do that, it will rename the original
    > file and write the new one with the old name.  This is why absoluteURL
    > or absoluteOriginalContentsURL won't necessarily jive with -fileURL
    > (see the comment header for -[NSDocument
    > writeSafelyToURL:ofType:forSaveOperation:error:] for more details).
    > The upshot is that absoluteOriginalContentsURL will be a URL which you
    > can use to access your existing on-disk data, whether or not AppKit
    > has temporarily renamed it.

    I don't see how trying to do this in -
    writeToURL:ofType:forSaveOperation:originalContentsURL:error: is ever
    going to work.

    If you are given (basically) an old and new package location, then
    you're forced to copy everything:

    -- Trying to write changed files in the old location would be a really
    bad idea.

    -- Moving the unchanged files from the old location to the new
    location would probably work (if the save operation is a pure save,
    not a save-as or save-to), but would be a really bad idea if there was
    an error during the save (because the contents of the new location
    would presumably get thrown away).

    The *real* question here is: what's a *safe* strategy for saving a
    package by changing parts of it? You really need a single atomic
    operation to commit the changes, and most file systems don't provide
    this for an arbitrary set of files. NSDocument's answer is that there
    isn't a safe strategy, so it always saves by creating a copy. (And
    even that's not perfectly safe if the file system doesn't provide the
    equivalent of FSSwapFiles.)

    Using Core Data as the storage mechanism for blobs of data is a
    possibility, but it's also a PITA because:

    -- you likely need to turn off NSPersistentDocument's undo handling
    and provide your own

    -- Core Data doesn't have save-to, and its save-as sucks (does a store
    migration)

    -- if you have a lot of data, Core Data is going to keep copies of
    much of it in internal caches

    My suggestion would be to go ahead and use a package document format,
    and to copy the unchanged files, and see how long it takes. If the
    save times are unacceptable, then a database solution (not Core Data)
    is probably the next step.

    FWIW
  • On Dec 31, 2008, at 1:32 AM, Kyle Sluder wrote:
    > On Wed, Dec 31, 2008 at 4:16 AM, Markus Spoettl
    > <msapplelists...> wrote:
    >> No I have not, but I have a feeling that it wouldn't be suitable.
    >> The data I
    >> store contains (when the document is big) millions of double values
    >> (amongst
    >> other things) spread across hundreds of thousands of objects. If the
    >> performance of NSKeyedArchiver is any indication the system
    >> wouldn't scale
    >> very well. It is an assumption and it might be totally wrong but I
    >> guess the
    >> overhead of keyed archiving is significantly less than that of Core
    >> Data.
    >
    > Quite the contrary, I'm afraid.  Although without actually testing it,
    > you can't know for sure.

    OK, I wouldn't have expected that. The question is rather academic for
    me anyway as the application is existing and using non-Core Data
    objects already. Changing all the innards of the application is not an
    option at this point. Unless of course this is a lot less painfull
    than it sounds.

    >> That's no problem, that information is available. The documentation
    >> for
    >> -writeToURL:ofType:forSaveOperation:originalContentsURL:error:
    >> states:
    >>
    >> ----------
    >> The value of absoluteURL is often not the same as [self fileURL].
    >> Other
    >> times it is not the same as the URL for the final save destination.
    >> Likewise, absoluteOriginalContentsURL is often not the same value
    >> as [self
    >> fileURL].
    >> ----------
    >>
    >> which is a little problem because to update my packages I need to
    >> original
    >> location. Do you have any insights as to what "often not the same"
    >> in this
    >> context might mean? To write the diff into the package I'd have to
    >> have
    >> access to the package. It doesn't sounds as it that's guaranteed.
    >
    > For safe save operations, AppKit writes the data to a temporary file
    > on the same volume, and then swaps the old file with the new, which is
    > an atomic operation.  If it can't do that, it will rename the original
    > file and write the new one with the old name.  This is why absoluteURL
    > or absoluteOriginalContentsURL won't necessarily jive with -fileURL
    > (see the comment header for -[NSDocument
    > writeSafelyToURL:ofType:forSaveOperation:error:] for more details).
    > The upshot is that absoluteOriginalContentsURL will be a URL which you
    > can use to access your existing on-disk data, whether or not AppKit
    > has temporarily renamed it.

    That would not help a lot because I'd have to copy the unchanged parts
    or the old package into the new package. The whole idea was not to re-
    write data that hasn't changed.

    Thanks for the ideas though, I'll definitely investigate this further.

    Regards
    Markus
    --
  • On Wed, Dec 31, 2008 at 1:08 PM, Markus Spoettl
    <msapplelists...> wrote:
    > That would not help a lot because I'd have to copy the unchanged parts or
    > the old package into the new package. The whole idea was not to re-write
    > data that hasn't changed.

    I wasn't thinking completely straight last night... you'll want to
    override -writeSafelyToURL: to perform the atomic writes of your
    individual components.

    --Kyle Sluder
  • On Dec 31, 2008, at 4:46 AM, Quincey Morris wrote:
    > I don't see how trying to do this in -
    > writeToURL:ofType:forSaveOperation:originalContentsURL:error: is
    > ever going to work.
    >
    > If you are given (basically) an old and new package location, then
    > you're forced to copy everything:
    >
    > -- Trying to write changed files in the old location would be a
    > really bad idea.
    >
    > -- Moving the unchanged files from the old location to the new
    > location would probably work (if the save operation is a pure save,
    > not a save-as or save-to), but would be a really bad idea if there
    > was an error during the save (because the contents of the new
    > location would presumably get thrown away).
    >
    > The *real* question here is: what's a *safe* strategy for saving a
    > package by changing parts of it? You really need a single atomic
    > operation to commit the changes, and most file systems don't provide
    > this for an arbitrary set of files. NSDocument's answer is that
    > there isn't a safe strategy, so it always saves by creating a copy.
    > (And even that's not perfectly safe if the file system doesn't
    > provide the equivalent of FSSwapFiles.)
    >
    > Using Core Data as the storage mechanism for blobs of data is a
    > possibility, but it's also a PITA because:
    >
    > -- you likely need to turn off NSPersistentDocument's undo handling
    > and provide your own
    >
    > -- Core Data doesn't have save-to, and its save-as sucks (does a
    > store migration)
    >
    > -- if you have a lot of data, Core Data is going to keep copies of
    > much of it in internal caches
    >
    > My suggestion would be to go ahead and use a package document
    > format, and to copy the unchanged files, and see how long it takes.
    > If the save times are unacceptable, then a database solution (not
    > Core Data) is probably the next step.

    Thanks for the pointers, I'll think about that. It's rather surprising
    that NSDocument's save-as-copy-and-move strategy that works so well
    for single files backfires so heavily in my case.

    Regards
    Markus
    --
  • On Dec 31, 2008, at 10:12, Markus Spoettl wrote:

    > It's rather surprising that NSDocument's save-as-copy-and-move
    > strategy that works so well for single files backfires so heavily in
    > my case.

    Well, to be accurate, it's not a NSDocument issue, it's a problem with
    updating in place more than single file vs file package.

    Incidentally, there's a fairly simple safe strategy for file packages:

    -- Use a single "index" file that lists all the files that make up the
    current version of the document.

    -- When saving, first write all the changed data to new files with new
    names.

    -- Then update the index file atomically.

    -- Then delete all the out of date files.

    But that has a few drawbacks:

    -- The individual file names are potentially different every time you
    save (which may or may not matter to you).

    -- You need periodic housekeeping to detect orphaned files (due to
    saves that failed for some reason) and delete them.

    -- Differences in file name encodings/naming rules might cause
    problems if the whole package is manually copied from one file system
    to another.

    If you can come up with any acceptable safe strategy, then it's still
    an issue how to integrate it into NSDocument. writeSafelyToURL: seems
    like the obvious place, but its documentation says "must call super"
    if overridden, and calling super is probably going to mess up your
    strategy.

    BTW, before you decide that long save times are unacceptable, take a
    look at how Amadeus behaves when saving large sound files. It has the
    best "slow" save (from a usability point of view) of any app I've seen.
  • On Dec 31, 2008, at 11:29 AM, Quincey Morris wrote:
    >> It's rather surprising that NSDocument's save-as-copy-and-move
    >> strategy that works so well for single files backfires so heavily
    >> in my case.
    >
    > Well, to be accurate, it's not a NSDocument issue, it's a problem
    > with updating in place more than single file vs file package.
    >
    > Incidentally, there's a fairly simple safe strategy for file packages:
    >
    > -- Use a single "index" file that lists all the files that make up
    > the current version of the document.
    >
    > -- When saving, first write all the changed data to new files with
    > new names.
    >
    > -- Then update the index file atomically.
    >
    > -- Then delete all the out of date files.
    >
    > But that has a few drawbacks:
    >
    > -- The individual file names are potentially different every time
    > you save (which may or may not matter to you).
    >
    > -- You need periodic housekeeping to detect orphaned files (due to
    > saves that failed for some reason) and delete them.
    >
    > -- Differences in file name encodings/naming rules might cause
    > problems if the whole package is manually copied from one file
    > system to another.

    Sounds like it should be doable quite easily in my case. Thanks very
    much for the verbosity.

    > If you can come up with any acceptable safe strategy, then it's
    > still an issue how to integrate it into NSDocument.
    > writeSafelyToURL: seems like the obvious place, but its
    > documentation says "must call super" if overridden, and calling
    > super is probably going to mess up your strategy.

    That's what I thought too. What's the point of implementing a
    different way to safely save things when at the end you have to use
    the built-in behavior. It really doesn't make any sense, I believe
    this must be a documentation inaccuracy which really meant to say "be
    sure to call super unless you're doing you completely home-grown
    saving stuff on your own". Pure speculation of course.

    > BTW, before you decide that long save times are unacceptable, take a
    > look at how Amadeus behaves when saving large sound files. It has
    > the best "slow" save (from a usability point of view) of any app
    > I've seen.

    Which brings me to something else. How does one do that, meaning how
    can I provide a save progress. I can't use a second thread to save the
    document because (I think) AppKit expects that -writeSafelyToURL:
    returns when it's done. Starting a thread and off-loading the work
    there so that I can update the UI while the operation is going on
    isn't going to work. An asynchronous mechanism where I can tell the
    document what is has been saved similar to NSApplications -
    applicationShouldTerminate: and -replyToApplicationShouldTerminate:
    would be what I'd need for that. Currently the only way of doing a
    save progress with user interaction is rolling my own save operation
    altogether, which has lots of implications (for example the behavior
    when saving in the process of app termination). Am I overlooking
    something obvious here?

    Regards
    Markus
    --
  • On Dec 31, 2008, at 12:11, Markus Spoettl wrote:

    > On Dec 31, 2008, at 11:29 AM, Quincey Morris wrote:
    >>> It's rather surprising that NSDocument's save-as-copy-and-move
    >>> strategy that works so well for single files backfires so heavily
    >>> in my case.
    >> ...
    >> -- You need periodic housekeeping to detect orphaned files (due to
    >> saves that failed for some reason) and delete them.
    >> ...

    It didn't occur to me when I wrote this that it might need an
    exclusive lock on the package to do this (and possibly other steps)
    safely.

    >> If you can come up with any acceptable safe strategy, then it's
    >> still an issue how to integrate it into NSDocument.
    >> writeSafelyToURL: seems like the obvious place, but its
    >> documentation says "must call super" if overridden, and calling
    >> super is probably going to mess up your strategy.
    >
    > That's what I thought too. What's the point of implementing a
    > different way to safely save things when at the end you have to use
    > the built-in behavior. It really doesn't make any sense, I believe
    > this must be a documentation inaccuracy which really meant to say
    > "be sure to call super unless you're doing you completely home-grown
    > saving stuff on your own". Pure speculation of course.

    Reading the comments in NSDocument.h is instructive. If it was just a
    case of replacing the built-in file-handling strategy with your own,
    I'd say go ahead and override without calling super. But the comments
    refer to mysterious "other things" that need to be done, so even if
    failing to call super works now it might break things in the future.

    > Which brings me to something else. How does one do that, meaning how
    > can I provide a save progress. I can't use a second thread to save
    > the document because (I think) AppKit expects that -
    > writeSafelyToURL: returns when it's done. Starting a thread and off-
    > loading the work there so that I can update the UI while the
    > operation is going on isn't going to work. An asynchronous mechanism
    > where I can tell the document what is has been saved similar to
    > NSApplications -applicationShouldTerminate: and -
    > replyToApplicationShouldTerminate: would be what I'd need for that.
    > Currently the only way of doing a save progress with user
    > interaction is rolling my own save operation altogether, which has
    > lots of implications (for example the behavior when saving in the
    > process of app termination). Am I overlooking something obvious here?

    To do the save synchronously in the main thread with a progress sheet,
    I've had reasonable success sprinkling 'isCancelled' checks throughout
    the save code, and implementing 'isCancelled' like this:

    - (BOOL) isCancelled {
    NSEvent *event;
    while (event = [NSApp nextEventMatchingMask: NSAnyEventMask
    untilDate: nil inMode: NSEventTrackingRunLoopMode dequeue: YES])
      [NSApp sendEvent: event];
    return isOperationCancelled;
    }

    (The progress sheet's cancel button's action routine is responsible
    for setting isOperationCancelled to YES.) It's not elegant but it
    seems to work fine so long as it's called often enough.

    Doing it asynchronously is more of a puzzle.