Proposal for metadata interoperability on OS X
-
A lot of discussion on different application user forums seem to be
going on regarding the exchange of metadata between different
applications. Apple has provided parts of a possible solution in the
latest versions of Mac OS X but nothing that can be seen as the final
verdict.
First some background: with the advent of Spotlight, a lot of metadata
is made available by application developers for use in Spotlight
importers. It is relatively easy to extract this from your own data
files and hand them over to Spotlight. Apple has also put a lot of
work in providing a long list of keywords that can be used by these
importers to store this data in the Spotlight database.
There are however problems with types of applications that are
primarily using third party file formats (such as PDF for example) and
that want to add application specific metadata to these files. In this
case you cannot add these in most cases without some cumbersome
workarounds.
Another problem is with types of applications that want to import
metadata from third parties where it is difficult to parse the file-
format.
One mechanism that is currently used by some applications to resolve
this is the use of Finder/Spotlight comments. Elaborate formatting
options try to make some order out of what is essentially a free-form
text string. Moreover, this text string is under user control and can
be changed by her at any point in time, thereby destroying potentially
vital data.
What is needed is a mechanism that allows application developers to
add metadata to files without having to touch the actual file data. In
the Mac OS days resource forks were used for this purpose, but these
caused problems with foreign filesystems. In OS X (since Tiger) there
is a useful mechanism called eXtended ATTRibutes that allows for
metadata to be tacked on files. And since Leopard there is a way to
preserve this while using the Cocoa or Unix file manipulation classes/
functions and even when storing these on non-HFS disks.
What is missing is a standardized way to set and interpret the
metadata. What I'm proposing is to use the benefits of the Spotlight
indexing mechanism, i.e. a dictionary of standard keywords with
arbitrary values and use this on top of the extended attributes. This
would allow for transparent transfer of metadata between applications,
yet retain the use of Spotlight-based keyword searching. This would
even work with extra keywords that might have been defined for
Spotlight because the file type and application are known so these
could be loaded from the application's bundle dictionary.
An example Objective-C class to implement part of this: Uli Kusterer's
UKXattrMetadataStore class that can be found at <http://codebeach.org/code/show/15>. Missing from this is some error checking with regards to the
limitations of xattr and a way to map keywords to localized
descriptions (these can be found from third-party Spotlight schemas
but seem to be hidden for the Apple keywords).
I'm looking to start a discussion on this list that can be of benefit
to all of us and hopefully Apple will take notice and may take our
ideas to heart while they're working on 10.6.
Annard Brouwer
(contractor for DEVONtechnologies LLC and therefore very much involved
in this subject at the moment) -
On Jul 8, 2008, at 3:52 AM, <ab_lists...> wrote:> I'm looking to start a discussion on this list that can be of
> benefit to all of us and hopefully Apple will take notice and may
> take our ideas to heart while they're working on 10.6.
while the discussion would be interesting, I'll note two things.
One, the spotlight list is probably more relevant than here.
Two, the list isn't the way to provide feedback like this to Apple. It
says that right in the guidelines. You can almost guarantee that it
won't be picked up from the list. Filing enhancement requests is the
only way to make bring this stuff to Apple's attention. -
I'm following up on my own post because I received two comments that
made me realise I should clarify why I want this discussion here on
this list.
1. "You should file a bug report with Apple."
This is the intention (and I know that at least one person has filed a
request already), but I would like to have a hashed out plan that
covers things we discuss here so there are not many ambiguities left.
Secondly, we at DEVONtechnologies need a working solution sooner than
later. I was hoping that several third parties may be able to agree on
something that we can work with in the Leopard time frame.
2. "You should discuss this on the Spotlight list."
I thought about it, but for me this isn't directly about Spotlight. I
want to use some of the infrastructure that Spotlight brings but in a
way it's about application interoperability. I want to access metadata
from other applications' files and provide metadata to them and not
through Spotlight queries. This may be too self-centered but we can
always move to the Spotlight list if people here disagree with me.
Thanks,
Annard -
There exists a great way for all applications to share 'Address Book'
data on OS X. The same cannot be said for user - entered meta data.
Things such as tags, urls, etc, have no conventions for
interoperability.
Right now the only 'standard' user entered meta-data is the Finder
comments and the Finder labels, both of which date from 199x in System
7 or thereabouts.
It seems that a fairly straight forward agreement on some extended
attribute keys and values would be in order.
An example seemed a better way to describe what I am talking about:
EG: (all names and numbers are fanciful)
User Entered tags:
--------------------------
stored: under the keyword: kXATTR_UserTags
value: NSArray of NSStrings. No hierarchy, maximum tag length 100
chars, maximum number of tags 100, Guidelines: tags should be entered
by the user, and not be things like GUIDs, paths, etc.
URLS:
--------------------------
urls: under the keyword kXATTR_URL
value: NSArray of NSDictionaries, each dict has a url field and a name
field, etc.
It would be really nice if some of these attributes made their way
into the spotlight database, to facilitate searching.
This really has nothing to do with Apple, unless they are planning on
adding some standardized extended attributes along these lines for
Snow Leopard. It seems likely that even if Apple does add a few
'official' xattrs on files, that a richer set of attributes should be
agreed on by several interested developers. We need more than an API
to set xattrs, we need to agree on names and formats.
--Tom Andersen
Ironic Software
On 8-Jul-08, at 3:52 AM, <ab_lists...> wrote:> A lot of discussion on different application user forums seem to be> >. Missing from this is some error checking with regards to the
> going on regarding the exchange of metadata between different
> applications. Apple has provided parts of a possible solution in the
> latest versions of Mac OS X but nothing that can be seen as the
> final verdict.
>
> First some background: with the advent of Spotlight, a lot of
> metadata is made available by application developers for use in
> Spotlight importers. It is relatively easy to extract this from your
> own data files and hand them over to Spotlight. Apple has also put a
> lot of work in providing a long list of keywords that can be used by
> these importers to store this data in the Spotlight database.
>
> There are however problems with types of applications that are
> primarily using third party file formats (such as PDF for example)
> and that want to add application specific metadata to these files.
> In this case you cannot add these in most cases without some
> cumbersome workarounds.
>
> Another problem is with types of applications that want to import
> metadata from third parties where it is difficult to parse the file-
> format.
>
> One mechanism that is currently used by some applications to resolve
> this is the use of Finder/Spotlight comments. Elaborate formatting
> options try to make some order out of what is essentially a free-
> form text string. Moreover, this text string is under user control
> and can be changed by her at any point in time, thereby destroying
> potentially vital data.
>
> What is needed is a mechanism that allows application developers to
> add metadata to files without having to touch the actual file data.
> In the Mac OS days resource forks were used for this purpose, but
> these caused problems with foreign filesystems. In OS X (since
> Tiger) there is a useful mechanism called eXtended ATTRibutes that
> allows for metadata to be tacked on files. And since Leopard there
> is a way to preserve this while using the Cocoa or Unix file
> manipulation classes/functions and even when storing these on non-
> HFS disks.
>
> What is missing is a standardized way to set and interpret the
> metadata. What I'm proposing is to use the benefits of the Spotlight
> indexing mechanism, i.e. a dictionary of standard keywords with
> arbitrary values and use this on top of the extended attributes.
> This would allow for transparent transfer of metadata between
> applications, yet retain the use of Spotlight-based keyword
> searching. This would even work with extra keywords that might have
> been defined for Spotlight because the file type and application are
> known so these could be loaded from the application's bundle
> dictionary.
>
> An example Objective-C class to implement part of this: Uli
> Kusterer's UKXattrMetadataStore class that can be found at <http://codebeach.org/code/show/15> limitations of xattr and a way to map keywords to localized
> descriptions (these can be found from third-party Spotlight schemas
> but seem to be hidden for the Apple keywords).
>
> I'm looking to start a discussion on this list that can be of
> benefit to all of us and hopefully Apple will take notice and may
> take our ideas to heart while they're working on 10.6.
>
> Annard Brouwer
> (contractor for DEVONtechnologies LLC and therefore very much
> involved in this subject at the moment) -
On Tue, Jul 8, 2008 at 10:45 AM, Tom Andersen <knobsturner...> wrote:> User Entered tags:
> --------------------------
> stored: under the keyword: kXATTR_UserTags
> value: NSArray of NSStrings. No hierarchy, maximum tag length 100 chars,
> maximum number of tags 100, Guidelines: tags should be entered by the user,
> and not be things like GUIDs, paths, etc.
You want all of this in one extended attribute? Even assuming no
overhead for NSArray or NSStrings, wouldn't 100 chars/tag * 100 tags
already be about 2-3 times the maximum amount of storage currently
allowed in one generic extended attribute? As I understand the maximum
is roughly 4K, with the ResourceFork being a special case clearly. -
As I said, all numbers are fanciful. I just wanted to get a discussion
started.
From our point of view I would prefer a much much smaller limit.
There needs to be limits so that we can design interface around these
attributes, and it seems also for technical reasons on the size of an
attribute. I think that the way to go about this is to talk this back
and forth here for a bit, then have someone go and write some code to
implement the guidelines and turn them into a nice cocoa api. For
instance, one thing that I run into with tags is that some programs/
users enter (say 6) tags as a comma delimited string, and this gets
stored as a single string in an array of length one. It's not
impossible to catch this sort of thing and deal with it, but if it
were all in one open sourced class, it would be a lot easier and more
consistent.
Suggested Fields
---------------
Tags
Authors
URLs
Character encoding for text files
Checksum
+ application defined?
It seems like a small well defined set of attributes to use would be
better than a larger set. At least as a first step.
Where are the maximum xattr lengths documented? I read on a wiki page
that HFS+ limits xattrs to 'one b-tree node' which I take it is 4k.
http://en.wikipedia.org/wiki/Extended_file_attributes
--Tom
On 8-Jul-08, at 10:59 AM, Mac QA wrote:> On Tue, Jul 8, 2008 at 10:45 AM, Tom Andersen <knobsturner...>
> wrote:
>> User Entered tags:
>> --------------------------
>> stored: under the keyword: kXATTR_UserTags
>> value: NSArray of NSStrings. No hierarchy, maximum tag length 100
>> chars,
>> maximum number of tags 100, Guidelines: tags should be entered by
>> the user,
>> and not be things like GUIDs, paths, etc.
>
> You want all of this in one extended attribute? Even assuming no
> overhead for NSArray or NSStrings, wouldn't 100 chars/tag * 100 tags
> already be about 2-3 times the maximum amount of storage currently
> allowed in one generic extended attribute? As I understand the maximum
> is roughly 4K, with the ResourceFork being a special case clearly.
>
On 8-Jul-08, at 10:59 AM, Mac QA wrote:> On Tue, Jul 8, 2008 at 10:45 AM, Tom Andersen <knobsturner...>
> wrote:
>> User Entered tags:
>> --------------------------
>> stored: under the keyword: kXATTR_UserTags
>> value: NSArray of NSStrings. No hierarchy, maximum tag length 100
>> chars,
>> maximum number of tags 100, Guidelines: tags should be entered by
>> the user,
>> and not be things like GUIDs, paths, etc.
>
> You want all of this in one extended attribute? Even assuming no
> overhead for NSArray or NSStrings, wouldn't 100 chars/tag * 100 tags
> already be about 2-3 times the maximum amount of storage currently
> allowed in one generic extended attribute? As I understand the maximum
> is roughly 4K, with the ResourceFork being a special case clearly. -
On Tue, Jul 8, 2008 at 4:20 PM, Tom Andersen <knobsturner...> wrote:> I think that
> the way to go about this is to talk this back and forth here for a bit, then
> have someone go and write some code to implement the guidelines and turn
> them into a nice cocoa api. For instance, one thing that I run into with
> tags is that some programs/users enter (say 6) tags as a comma delimited
> string, and this gets stored as a single string in an array of length one.
> It's not impossible to catch this sort of thing and deal with it, but if it
> were all in one open sourced class, it would be a lot easier and more
> consistent.
>
> Suggested Fields
> ---------------
> Tags
It's worth noting that there is already something of a de facto tag
structure in SpotMeta
(http://www.fluffy.co.uk/spotmeta/spotmeta_org.html).
SpotMeta hasn't been updated for Leopard (the COM swizzling stuff has
stopped working) but it's under the GPL.
Hamish -
On 08 Jul 2008, at 17:43, Hamish Allan wrote:>
> It's worth noting that there is already something of a de facto tag
> structure in SpotMeta
> (http://www.fluffy.co.uk/spotmeta/spotmeta_org.html).
>
> SpotMeta hasn't been updated for Leopard (the COM swizzling stuff has
> stopped working) but it's under the GPL.
>
I don't like encoding datatypes in keywords (it reminds me of very
heated discussions with certain DB admins in my WebObjects days). I
was thinking of a set of defined keywords and using a plist
(traditional or xml) to encode the values with. That's a lot more
flexible. And if you use a schema, you will also have localised names
of keywords. That can be handy if you need to populate a user
interface where you need to display attributes that are out of your
control.
Annard


