FROM : Nathan Vander Wilt
DATE : Wed Apr 30 17:00:00 2008
On Apr 28, 2008, at 5:35 AM, John Joyce wrote:
>> Graham,
>>
>> Thanks for your reply! But how can I "find the range of the word"
>> given
>> the glyph index? I just can not find an API doing so.
>> [snip]
> The range of the word is up to you to find and depends on the
> language. If it is any common language from Europe, your job is a
> lot easier. You mainly need to work with whitespace and punctuation.
> If you're working with Japanese, you'll need to learn of lot of
> complex tricks to identify the range of words...[snip]
"Nota bene" that the popup/Cmd-Ctrl-D dictionary does not look up
single words only. Try it on that Latin phrase, and it's not just
because of the quotes.
The heuristic seems to be the longest main entry that matches (exactly
after normalization?).
It finds "ablative absolute" and "a cappella" and "Aaron Copland",
over "ablative" and "a" and "Aaron" (and "nota" above). Thus, there
seems to be no way to pop up a definition for "a capella" [sic], only
"a" and "capella", as was confusing me until I realized my mispelling.
Any "stemming" seem to be handled by either dictionary entries
("ablative absolutes" shows in Dictionary.app's list although it pulls
up the singular entry), or finally by word boundaries (compare "Aaron
Coplands" to "Aaron Copland's").
The main entry category doesn't seem to include phrases, as "a bit
much" is findable in Dictionary.app, but not in the popup panel. The
exactly part seems to come from "nota bene" (with two regular spaces)
failing. Inserting a single U+2003 EM SPACE also causes failure, but
not a U+00A0 NO-BREAK SPACE (which might be explained in terms of
normalization before matching?).
Of course, this is all just tested on words in the Oxford English,
testing on Japanese text may reveal subtleties that affect the
heuristic, which was deliberately vague about whether longest is in
terms of words or characters (words seems most likely, so the previous
discussion is still relevant).
What none of this explains is why Preview.app won't give a pop-up
dictionary even when a PDF has selectable text (which can be pulled up
in Dictionary.app). For me, that would be an interesting explanation
to hear.
thanks,
-natevw
DATE : Wed Apr 30 17:00:00 2008
On Apr 28, 2008, at 5:35 AM, John Joyce wrote:
>> Graham,
>>
>> Thanks for your reply! But how can I "find the range of the word"
>> given
>> the glyph index? I just can not find an API doing so.
>> [snip]
> The range of the word is up to you to find and depends on the
> language. If it is any common language from Europe, your job is a
> lot easier. You mainly need to work with whitespace and punctuation.
> If you're working with Japanese, you'll need to learn of lot of
> complex tricks to identify the range of words...[snip]
"Nota bene" that the popup/Cmd-Ctrl-D dictionary does not look up
single words only. Try it on that Latin phrase, and it's not just
because of the quotes.
The heuristic seems to be the longest main entry that matches (exactly
after normalization?).
It finds "ablative absolute" and "a cappella" and "Aaron Copland",
over "ablative" and "a" and "Aaron" (and "nota" above). Thus, there
seems to be no way to pop up a definition for "a capella" [sic], only
"a" and "capella", as was confusing me until I realized my mispelling.
Any "stemming" seem to be handled by either dictionary entries
("ablative absolutes" shows in Dictionary.app's list although it pulls
up the singular entry), or finally by word boundaries (compare "Aaron
Coplands" to "Aaron Copland's").
The main entry category doesn't seem to include phrases, as "a bit
much" is findable in Dictionary.app, but not in the popup panel. The
exactly part seems to come from "nota bene" (with two regular spaces)
failing. Inserting a single U+2003 EM SPACE also causes failure, but
not a U+00A0 NO-BREAK SPACE (which might be explained in terms of
normalization before matching?).
Of course, this is all just tested on words in the Oxford English,
testing on Japanese text may reveal subtleties that affect the
heuristic, which was deliberately vague about whether longest is in
terms of words or characters (words seems most likely, so the previous
discussion is still relevant).
What none of this explains is why Preview.app won't give a pop-up
dictionary even when a PDF has selectable text (which can be pulled up
in Dictionary.app). For me, that would be an interesting explanation
to hear.
thanks,
-natevw






Cocoa mail archive

