Storing NSDocument data in packages/bundles
-
Hello List,
I'm thinking of switching from a single-file based document file
storage to a package document with various files in it, I have a
little problem with that though:
My app stores a lot of data in a document, potentially hundreds of
megabytes. The problem with using single files and NSKeyedArchiver is
that one has to write everything, even if only a tiny part of the
structure changed. My data is clustered into handy little pieces which
could be saved into their own separate files within the package.
The problem is that - if I understand it correctly - when using file
wrapper -fileWrapperOfType: of NSDocument, I always have to provide
the complete wrapper, including all contained wrappers. That of course
is exactly what I hoped to avoid because it means I'd have to write
all data (this time into separate files/wrappers).
The NSBundle guide mentions another technique of using "traditional
file system methods" for loading and saving when you have special
needs. It's rather vague beyond that and I was hoping to get some
hints as to how to tackle this problem.
I wouldn't mind loading all the data at once. What I'm after is a
mechanism that would allow me to update only parts of the package
(adding, changing or deleting files as needed). Is there any chance I
can use -fileWrapperOfType: for that?
Looking at the NSDocument file saving message flow, it seems rather
difficult to come up with something completely different. The document
location you're saving to is not the final location of the real
document on the disk. That of course prevents you from only writing
parts of the document because the other stuff would be lost when your
new version is moved to its final destination.
So the question is, can I use NSDocument with an "update-relevant-
parts-of-a-package-only" saving mechanism? Thanks for any pointers!
Regards
Markus
-- -
On Wed, Dec 31, 2008 at 3:51 AM, Markus Spoettl
<msapplelists...> wrote:> My app stores a lot of data in a document, potentially hundreds of
> megabytes. The problem with using single files and NSKeyedArchiver is that
> one has to write everything, even if only a tiny part of the structure
> changed. My data is clustered into handy little pieces which could be saved
> into their own separate files within the package.
Have you thought of using Core Data instead? This is what the SQLite
store is designed to address.> The problem is that - if I understand it correctly - when using file wrapper
> -fileWrapperOfType: of NSDocument, I always have to provide the complete
> wrapper, including all contained wrappers. That of course is exactly what I
> hoped to avoid because it means I'd have to write all data (this time into
> separate files/wrappers).
That is correct. You have to implement -readFromURL:ofType:error: and
-writeToURL:ofType:forSaveOperation:originalContentsURL:error: if you
want to go the document-package route and not wind up with wholesale
writing. This means, of course, that you need to be able to tell what
has changed in the document, and then be able to determine which
fragments need to be updated on disk.
--Kyle Sluder -
On Dec 31, 2008, at 12:57 AM, Kyle Sluder wrote:> On Wed, Dec 31, 2008 at 3:51 AM, Markus Spoettl
> <msapplelists...> wrote:
>> My app stores a lot of data in a document, potentially hundreds of
>> megabytes. The problem with using single files and NSKeyedArchiver
>> is that
>> one has to write everything, even if only a tiny part of the
>> structure
>> changed. My data is clustered into handy little pieces which could
>> be saved
>> into their own separate files within the package.
>
> Have you thought of using Core Data instead? This is what the SQLite
> store is designed to address.
No I have not, but I have a feeling that it wouldn't be suitable. The
data I store contains (when the document is big) millions of double
values (amongst other things) spread across hundreds of thousands of
objects. If the performance of NSKeyedArchiver is any indication the
system wouldn't scale very well. It is an assumption and it might be
totally wrong but I guess the overhead of keyed archiving is
significantly less than that of Core Data.>> The problem is that - if I understand it correctly - when using
>> file wrapper
>> -fileWrapperOfType: of NSDocument, I always have to provide the
>> complete
>> wrapper, including all contained wrappers. That of course is
>> exactly what I
>> hoped to avoid because it means I'd have to write all data (this
>> time into
>> separate files/wrappers).
>
> That is correct. You have to implement -readFromURL:ofType:error: and
> -writeToURL:ofType:forSaveOperation:originalContentsURL:error: if you
> want to go the document-package route and not wind up with wholesale
> writing. This means, of course, that you need to be able to tell what
> has changed in the document, and then be able to determine which
> fragments need to be updated on disk.
That's no problem, that information is available. The documentation
for -writeToURL:ofType:forSaveOperation:originalContentsURL:error:
states:
----------
The value of absoluteURL is often not the same as [self fileURL].
Other times it is not the same as the URL for the final save
destination. Likewise, absoluteOriginalContentsURL is often not the
same value as [self fileURL].
----------
which is a little problem because to update my packages I need to
original location. Do you have any insights as to what "often not the
same" in this context might mean? To write the diff into the package
I'd have to have access to the package. It doesn't sounds as it that's
guaranteed.
Regards
Markus
-- -
On Wed, Dec 31, 2008 at 4:16 AM, Markus Spoettl
<msapplelists...> wrote:> No I have not, but I have a feeling that it wouldn't be suitable. The data I
> store contains (when the document is big) millions of double values (amongst
> other things) spread across hundreds of thousands of objects. If the
> performance of NSKeyedArchiver is any indication the system wouldn't scale
> very well. It is an assumption and it might be totally wrong but I guess the
> overhead of keyed archiving is significantly less than that of Core Data.
Quite the contrary, I'm afraid. Although without actually testing it,
you can't know for sure.> That's no problem, that information is available. The documentation for
> -writeToURL:ofType:forSaveOperation:originalContentsURL:error: states:
>
> ----------
> The value of absoluteURL is often not the same as [self fileURL]. Other
> times it is not the same as the URL for the final save destination.
> Likewise, absoluteOriginalContentsURL is often not the same value as [self
> fileURL].
> ----------
>
> which is a little problem because to update my packages I need to original
> location. Do you have any insights as to what "often not the same" in this
> context might mean? To write the diff into the package I'd have to have
> access to the package. It doesn't sounds as it that's guaranteed.
For safe save operations, AppKit writes the data to a temporary file
on the same volume, and then swaps the old file with the new, which is
an atomic operation. If it can't do that, it will rename the original
file and write the new one with the old name. This is why absoluteURL
or absoluteOriginalContentsURL won't necessarily jive with -fileURL
(see the comment header for -[NSDocument
writeSafelyToURL:ofType:forSaveOperation:error:] for more details).
The upshot is that absoluteOriginalContentsURL will be a URL which you
can use to access your existing on-disk data, whether or not AppKit
has temporarily renamed it.
--Kyle Sluder -
By the way, this means that you can't just write out a diff to the
existing file on disk. You must replace the entire file inside your
document package. This is what I had thought you would originally
want to do; you are still performing wholesale writes, but not of the
entire document, just the shards within the package whose contents are
affected. -
On Dec 31, 2008, at 01:32, Kyle Sluder wrote:> For safe save operations, AppKit writes the data to a temporary file
> on the same volume, and then swaps the old file with the new, which is
> an atomic operation. If it can't do that, it will rename the original
> file and write the new one with the old name. This is why absoluteURL
> or absoluteOriginalContentsURL won't necessarily jive with -fileURL
> (see the comment header for -[NSDocument
> writeSafelyToURL:ofType:forSaveOperation:error:] for more details).
> The upshot is that absoluteOriginalContentsURL will be a URL which you
> can use to access your existing on-disk data, whether or not AppKit
> has temporarily renamed it.
I don't see how trying to do this in -
writeToURL:ofType:forSaveOperation:originalContentsURL:error: is ever
going to work.
If you are given (basically) an old and new package location, then
you're forced to copy everything:
-- Trying to write changed files in the old location would be a really
bad idea.
-- Moving the unchanged files from the old location to the new
location would probably work (if the save operation is a pure save,
not a save-as or save-to), but would be a really bad idea if there was
an error during the save (because the contents of the new location
would presumably get thrown away).
The *real* question here is: what's a *safe* strategy for saving a
package by changing parts of it? You really need a single atomic
operation to commit the changes, and most file systems don't provide
this for an arbitrary set of files. NSDocument's answer is that there
isn't a safe strategy, so it always saves by creating a copy. (And
even that's not perfectly safe if the file system doesn't provide the
equivalent of FSSwapFiles.)
Using Core Data as the storage mechanism for blobs of data is a
possibility, but it's also a PITA because:
-- you likely need to turn off NSPersistentDocument's undo handling
and provide your own
-- Core Data doesn't have save-to, and its save-as sucks (does a store
migration)
-- if you have a lot of data, Core Data is going to keep copies of
much of it in internal caches
My suggestion would be to go ahead and use a package document format,
and to copy the unchanged files, and see how long it takes. If the
save times are unacceptable, then a database solution (not Core Data)
is probably the next step.
FWIW -
On Dec 31, 2008, at 1:32 AM, Kyle Sluder wrote:> On Wed, Dec 31, 2008 at 4:16 AM, Markus Spoettl
> <msapplelists...> wrote:
>> No I have not, but I have a feeling that it wouldn't be suitable.
>> The data I
>> store contains (when the document is big) millions of double values
>> (amongst
>> other things) spread across hundreds of thousands of objects. If the
>> performance of NSKeyedArchiver is any indication the system
>> wouldn't scale
>> very well. It is an assumption and it might be totally wrong but I
>> guess the
>> overhead of keyed archiving is significantly less than that of Core
>> Data.
>
> Quite the contrary, I'm afraid. Although without actually testing it,
> you can't know for sure.
OK, I wouldn't have expected that. The question is rather academic for
me anyway as the application is existing and using non-Core Data
objects already. Changing all the innards of the application is not an
option at this point. Unless of course this is a lot less painfull
than it sounds.>> That's no problem, that information is available. The documentation
>> for
>> -writeToURL:ofType:forSaveOperation:originalContentsURL:error:
>> states:
>>
>> ----------
>> The value of absoluteURL is often not the same as [self fileURL].
>> Other
>> times it is not the same as the URL for the final save destination.
>> Likewise, absoluteOriginalContentsURL is often not the same value
>> as [self
>> fileURL].
>> ----------
>>
>> which is a little problem because to update my packages I need to
>> original
>> location. Do you have any insights as to what "often not the same"
>> in this
>> context might mean? To write the diff into the package I'd have to
>> have
>> access to the package. It doesn't sounds as it that's guaranteed.
>
> For safe save operations, AppKit writes the data to a temporary file
> on the same volume, and then swaps the old file with the new, which is
> an atomic operation. If it can't do that, it will rename the original
> file and write the new one with the old name. This is why absoluteURL
> or absoluteOriginalContentsURL won't necessarily jive with -fileURL
> (see the comment header for -[NSDocument
> writeSafelyToURL:ofType:forSaveOperation:error:] for more details).
> The upshot is that absoluteOriginalContentsURL will be a URL which you
> can use to access your existing on-disk data, whether or not AppKit
> has temporarily renamed it.
That would not help a lot because I'd have to copy the unchanged parts
or the old package into the new package. The whole idea was not to re-
write data that hasn't changed.
Thanks for the ideas though, I'll definitely investigate this further.
Regards
Markus
-- -
On Wed, Dec 31, 2008 at 1:08 PM, Markus Spoettl
<msapplelists...> wrote:> That would not help a lot because I'd have to copy the unchanged parts or
> the old package into the new package. The whole idea was not to re-write
> data that hasn't changed.
I wasn't thinking completely straight last night... you'll want to
override -writeSafelyToURL: to perform the atomic writes of your
individual components.
--Kyle Sluder -
On Dec 31, 2008, at 4:46 AM, Quincey Morris wrote:> I don't see how trying to do this in -
> writeToURL:ofType:forSaveOperation:originalContentsURL:error: is
> ever going to work.
>
> If you are given (basically) an old and new package location, then
> you're forced to copy everything:
>
> -- Trying to write changed files in the old location would be a
> really bad idea.
>
> -- Moving the unchanged files from the old location to the new
> location would probably work (if the save operation is a pure save,
> not a save-as or save-to), but would be a really bad idea if there
> was an error during the save (because the contents of the new
> location would presumably get thrown away).
>
> The *real* question here is: what's a *safe* strategy for saving a
> package by changing parts of it? You really need a single atomic
> operation to commit the changes, and most file systems don't provide
> this for an arbitrary set of files. NSDocument's answer is that
> there isn't a safe strategy, so it always saves by creating a copy.
> (And even that's not perfectly safe if the file system doesn't
> provide the equivalent of FSSwapFiles.)
>
> Using Core Data as the storage mechanism for blobs of data is a
> possibility, but it's also a PITA because:
>
> -- you likely need to turn off NSPersistentDocument's undo handling
> and provide your own
>
> -- Core Data doesn't have save-to, and its save-as sucks (does a
> store migration)
>
> -- if you have a lot of data, Core Data is going to keep copies of
> much of it in internal caches
>
> My suggestion would be to go ahead and use a package document
> format, and to copy the unchanged files, and see how long it takes.
> If the save times are unacceptable, then a database solution (not
> Core Data) is probably the next step.
Thanks for the pointers, I'll think about that. It's rather surprising
that NSDocument's save-as-copy-and-move strategy that works so well
for single files backfires so heavily in my case.
Regards
Markus
-- -
On Dec 31, 2008, at 10:12, Markus Spoettl wrote:> It's rather surprising that NSDocument's save-as-copy-and-move
> strategy that works so well for single files backfires so heavily in
> my case.
Well, to be accurate, it's not a NSDocument issue, it's a problem with
updating in place more than single file vs file package.
Incidentally, there's a fairly simple safe strategy for file packages:
-- Use a single "index" file that lists all the files that make up the
current version of the document.
-- When saving, first write all the changed data to new files with new
names.
-- Then update the index file atomically.
-- Then delete all the out of date files.
But that has a few drawbacks:
-- The individual file names are potentially different every time you
save (which may or may not matter to you).
-- You need periodic housekeeping to detect orphaned files (due to
saves that failed for some reason) and delete them.
-- Differences in file name encodings/naming rules might cause
problems if the whole package is manually copied from one file system
to another.
If you can come up with any acceptable safe strategy, then it's still
an issue how to integrate it into NSDocument. writeSafelyToURL: seems
like the obvious place, but its documentation says "must call super"
if overridden, and calling super is probably going to mess up your
strategy.
BTW, before you decide that long save times are unacceptable, take a
look at how Amadeus behaves when saving large sound files. It has the
best "slow" save (from a usability point of view) of any app I've seen. -
On Dec 31, 2008, at 11:29 AM, Quincey Morris wrote:>> It's rather surprising that NSDocument's save-as-copy-and-move
>> strategy that works so well for single files backfires so heavily
>> in my case.
>
> Well, to be accurate, it's not a NSDocument issue, it's a problem
> with updating in place more than single file vs file package.
>
> Incidentally, there's a fairly simple safe strategy for file packages:
>
> -- Use a single "index" file that lists all the files that make up
> the current version of the document.
>
> -- When saving, first write all the changed data to new files with
> new names.
>
> -- Then update the index file atomically.
>
> -- Then delete all the out of date files.
>
> But that has a few drawbacks:
>
> -- The individual file names are potentially different every time
> you save (which may or may not matter to you).
>
> -- You need periodic housekeeping to detect orphaned files (due to
> saves that failed for some reason) and delete them.
>
> -- Differences in file name encodings/naming rules might cause
> problems if the whole package is manually copied from one file
> system to another.
Sounds like it should be doable quite easily in my case. Thanks very
much for the verbosity.> If you can come up with any acceptable safe strategy, then it's
> still an issue how to integrate it into NSDocument.
> writeSafelyToURL: seems like the obvious place, but its
> documentation says "must call super" if overridden, and calling
> super is probably going to mess up your strategy.
That's what I thought too. What's the point of implementing a
different way to safely save things when at the end you have to use
the built-in behavior. It really doesn't make any sense, I believe
this must be a documentation inaccuracy which really meant to say "be
sure to call super unless you're doing you completely home-grown
saving stuff on your own". Pure speculation of course.> BTW, before you decide that long save times are unacceptable, take a
> look at how Amadeus behaves when saving large sound files. It has
> the best "slow" save (from a usability point of view) of any app
> I've seen.
Which brings me to something else. How does one do that, meaning how
can I provide a save progress. I can't use a second thread to save the
document because (I think) AppKit expects that -writeSafelyToURL:
returns when it's done. Starting a thread and off-loading the work
there so that I can update the UI while the operation is going on
isn't going to work. An asynchronous mechanism where I can tell the
document what is has been saved similar to NSApplications -
applicationShouldTerminate: and -replyToApplicationShouldTerminate:
would be what I'd need for that. Currently the only way of doing a
save progress with user interaction is rolling my own save operation
altogether, which has lots of implications (for example the behavior
when saving in the process of app termination). Am I overlooking
something obvious here?
Regards
Markus
-- -
On Dec 31, 2008, at 12:11, Markus Spoettl wrote:> On Dec 31, 2008, at 11:29 AM, Quincey Morris wrote:
>>> It's rather surprising that NSDocument's save-as-copy-and-move
>>> strategy that works so well for single files backfires so heavily
>>> in my case.
>> ...
>> -- You need periodic housekeeping to detect orphaned files (due to
>> saves that failed for some reason) and delete them.
>> ...
It didn't occur to me when I wrote this that it might need an
exclusive lock on the package to do this (and possibly other steps)
safely.>> If you can come up with any acceptable safe strategy, then it's
>> still an issue how to integrate it into NSDocument.
>> writeSafelyToURL: seems like the obvious place, but its
>> documentation says "must call super" if overridden, and calling
>> super is probably going to mess up your strategy.
>
> That's what I thought too. What's the point of implementing a
> different way to safely save things when at the end you have to use
> the built-in behavior. It really doesn't make any sense, I believe
> this must be a documentation inaccuracy which really meant to say
> "be sure to call super unless you're doing you completely home-grown
> saving stuff on your own". Pure speculation of course.
Reading the comments in NSDocument.h is instructive. If it was just a
case of replacing the built-in file-handling strategy with your own,
I'd say go ahead and override without calling super. But the comments
refer to mysterious "other things" that need to be done, so even if
failing to call super works now it might break things in the future.> Which brings me to something else. How does one do that, meaning how
> can I provide a save progress. I can't use a second thread to save
> the document because (I think) AppKit expects that -
> writeSafelyToURL: returns when it's done. Starting a thread and off-
> loading the work there so that I can update the UI while the
> operation is going on isn't going to work. An asynchronous mechanism
> where I can tell the document what is has been saved similar to
> NSApplications -applicationShouldTerminate: and -
> replyToApplicationShouldTerminate: would be what I'd need for that.
> Currently the only way of doing a save progress with user
> interaction is rolling my own save operation altogether, which has
> lots of implications (for example the behavior when saving in the
> process of app termination). Am I overlooking something obvious here?
To do the save synchronously in the main thread with a progress sheet,
I've had reasonable success sprinkling 'isCancelled' checks throughout
the save code, and implementing 'isCancelled' like this:
- (BOOL) isCancelled {
NSEvent *event;
while (event = [NSApp nextEventMatchingMask: NSAnyEventMask
untilDate: nil inMode: NSEventTrackingRunLoopMode dequeue: YES])
[NSApp sendEvent: event];
return isOperationCancelled;
}
(The progress sheet's cancel button's action routine is responsible
for setting isOperationCancelled to YES.) It's not elegant but it
seems to work fine so long as it's called often enough.
Doing it asynchronously is more of a puzzle.


