FROM : Jens Alfke
DATE : Tue Mar 04 21:54:42 2008
On 4 Mar '08, at 10:46 AM, Ken Tozier wrote:
> I'm writing a generic parsing class and have one comparison method
> (isLiteral) I'd like to make as fast as possible. The basic question
> is: Does NSString use a hashing function that can be accessed
> without involving NSString? IE, is there a hashing function that can
> be used directly on an input pointer?
No. But I don't think that would help you. It's not enough to know the
hash code; you also have to compare the object for equality with all
the items in the dictionary that have the same hash code. Your
proposed rewrite seems to ignore the possibility of hash collisions;
it assumes that if the hashes match, so do the strings, which isn't
true.
I've written parsers before, and the usual technique is to first scan
the input into tokens, and then if a token is an identifier you look
it up in a dictionary of reserved words to see which one, if any, it
is. So instead of scanning every character in the string looking for a
particular literal/reserved word, the scanner simply looks for tokens,
including word boundaries. That makes this logic much simpler and
faster. (There's an NSScanner class that will help with this.)
In other words, it looks like this:
while( get next token from scanner )
if token is alphanumeric
reservedWordID = [[reservedWords objectForKey: token] intValue]
switch( reservedWordID )
...
else if token is '+'
...
...
—Jens
DATE : Tue Mar 04 21:54:42 2008
On 4 Mar '08, at 10:46 AM, Ken Tozier wrote:
> I'm writing a generic parsing class and have one comparison method
> (isLiteral) I'd like to make as fast as possible. The basic question
> is: Does NSString use a hashing function that can be accessed
> without involving NSString? IE, is there a hashing function that can
> be used directly on an input pointer?
No. But I don't think that would help you. It's not enough to know the
hash code; you also have to compare the object for equality with all
the items in the dictionary that have the same hash code. Your
proposed rewrite seems to ignore the possibility of hash collisions;
it assumes that if the hashes match, so do the strings, which isn't
true.
I've written parsers before, and the usual technique is to first scan
the input into tokens, and then if a token is an identifier you look
it up in a dictionary of reserved words to see which one, if any, it
is. So instead of scanning every character in the string looking for a
particular literal/reserved word, the scanner simply looks for tokens,
including word boundaries. That makes this logic much simpler and
faster. (There's an NSScanner class that will help with this.)
In other words, it looks like this:
while( get next token from scanner )
if token is alphanumeric
reservedWordID = [[reservedWords objectForKey: token] intValue]
switch( reservedWordID )
...
else if token is '+'
...
...
—Jens
| Related mails | Author | Date |
|---|---|---|
| Ken Tozier | Mar 4, 19:46 | |
| Jens Alfke | Mar 4, 21:54 |






Cocoa mail archive

