FROM : John Stiles
DATE : Tue Jan 29 20:38:41 2008
Douglas Davidson wrote:
>
> On Jan 29, 2008, at 11:05 AM, Citizen wrote:
>
>> You could get close with generating the characters you expect to find
>> at the word boundaries with:
>>
>> NSCharacterSet * wordBoundriesCharacterSet = [[NSCharacterSet
>> letterCharacterSet] invertedSet];
>>
>> You would need to change this accordingly if you did not want numbers
>> to be considered as a word boundary. You could of course create a
>> boundary character set with just whitespace and punctuation marks -
>> it just depends on how you would like the final feature to work.
>
> It's better not to reinvent the wheel for this sort of tokenization.
> There is API for word-boundary analysis in AppKit (doubleClickAtIndex:
> et al.), and, starting in Leopard, also in CoreFoundation
> (CFStringTokenizer), that handles this in a consistent and
> standards-appropriate fashion.
>
> The current find panel implementation works by searching for the
> string in question using any appropriate NSString compare options,
> then taking each result and determining whether it falls on word
> boundaries. If a given occurrence doesn't have the right
> word-boundary characteristics, the search continues.
>
Excellent, thank you for the information.
Actually, I've done a few brief tests with the Find panel (in Leopard)
and it appears to be fairly broken anyway :( Here's an example. Open up
TextEdit and, in a new document, type:
/this is a test.
/Now search for "is a" with "Full word" selected. It won't work. Or
search for "this " with a trailing space, or "test." with the trailing
period. It also won't work. As far as I can tell, "Full word" only
succeeds when you search for a /single word/ with no punctuation or
spaces of any kind. This is a little more limited than what I was hoping
for, so maybe I really do need to roll my own implementation.
Oh well, off to Radar to file a bug on the Find panel, and I'll figure
out some sort of solution. I can probably use NSCharacterSet or
something and look at the characters on either side of the found text. I
was hoping to avoid that, but it looks like I can't.
DATE : Tue Jan 29 20:38:41 2008
Douglas Davidson wrote:
>
> On Jan 29, 2008, at 11:05 AM, Citizen wrote:
>
>> You could get close with generating the characters you expect to find
>> at the word boundaries with:
>>
>> NSCharacterSet * wordBoundriesCharacterSet = [[NSCharacterSet
>> letterCharacterSet] invertedSet];
>>
>> You would need to change this accordingly if you did not want numbers
>> to be considered as a word boundary. You could of course create a
>> boundary character set with just whitespace and punctuation marks -
>> it just depends on how you would like the final feature to work.
>
> It's better not to reinvent the wheel for this sort of tokenization.
> There is API for word-boundary analysis in AppKit (doubleClickAtIndex:
> et al.), and, starting in Leopard, also in CoreFoundation
> (CFStringTokenizer), that handles this in a consistent and
> standards-appropriate fashion.
>
> The current find panel implementation works by searching for the
> string in question using any appropriate NSString compare options,
> then taking each result and determining whether it falls on word
> boundaries. If a given occurrence doesn't have the right
> word-boundary characteristics, the search continues.
>
Excellent, thank you for the information.
Actually, I've done a few brief tests with the Find panel (in Leopard)
and it appears to be fairly broken anyway :( Here's an example. Open up
TextEdit and, in a new document, type:
/this is a test.
/Now search for "is a" with "Full word" selected. It won't work. Or
search for "this " with a trailing space, or "test." with the trailing
period. It also won't work. As far as I can tell, "Full word" only
succeeds when you search for a /single word/ with no punctuation or
spaces of any kind. This is a little more limited than what I was hoping
for, so maybe I really do need to roll my own implementation.
Oh well, off to Radar to file a bug on the Find panel, and I'll figure
out some sort of solution. I can probably use NSCharacterSet or
something and look at the characters on either side of the found text. I
was hoping to avoid that, but it looks like I can't.






Cocoa mail archive

