FROM : b.bum
DATE : Sat Nov 06 18:22:36 2004
On Nov 5, 2004, at 11:59 PM, Kevin Ballard wrote:
> When I throw at it
>
> (http://www.foo.com/foo)test.
>
> it matches
>
> http://www.foo.com/foo)test.
>
> as the URL, which isn't correct. Like I said, a single regex *cannot*
> deal with this situation.
A single regex can handle that situation, it is just a pain to write.
Using Python's regular expressions as an example (because named
subexpressions are a lot nicer than indice based subexpressions):
>>> import re
>>> r = re.compile('^\((?P<u1>http://[^)]*)|(?P<u2>http://.*)')
>>> r.match('(http://foo.com/baz)bar').group('u1')
'http://foo.com/baz'
>>> r.match('http://foo.com/baz/bar').group('u2')
'http://foo.com/baz/bar'
The '|' -- or operator -- is the key. Ordering the expressions is
equally as important as you must have the most specific matching
expression first.
Frankly, I would lean to writing multiple expressions and evaluating
them one after the other. Unless performance is critical at which
point a single expression will be more efficient CPU wise (but may have
significant memory impact as the state map used internally can become
quite large, especially if you are using a unicode capable regex
engine).
b.bum
DATE : Sat Nov 06 18:22:36 2004
On Nov 5, 2004, at 11:59 PM, Kevin Ballard wrote:
> When I throw at it
>
> (http://www.foo.com/foo)test.
>
> it matches
>
> http://www.foo.com/foo)test.
>
> as the URL, which isn't correct. Like I said, a single regex *cannot*
> deal with this situation.
A single regex can handle that situation, it is just a pain to write.
Using Python's regular expressions as an example (because named
subexpressions are a lot nicer than indice based subexpressions):
>>> import re
>>> r = re.compile('^\((?P<u1>http://[^)]*)|(?P<u2>http://.*)')
>>> r.match('(http://foo.com/baz)bar').group('u1')
'http://foo.com/baz'
>>> r.match('http://foo.com/baz/bar').group('u2')
'http://foo.com/baz/bar'
The '|' -- or operator -- is the key. Ordering the expressions is
equally as important as you must have the most specific matching
expression first.
Frankly, I would lean to writing multiple expressions and evaluating
them one after the other. Unless performance is critical at which
point a single expression will be more efficient CPU wise (but may have
significant memory impact as the state map used internally can become
quite large, especially if you are using a unicode capable regex
engine).
b.bum
| Related mails | Author | Date |
|---|---|---|
| Mike O'Connor | Nov 6, 04:08 | |
| Joseph Heck | Nov 6, 04:13 | |
| Kevin Ballard | Nov 6, 04:23 | |
| John Siracusa | Nov 6, 05:05 | |
| Mike O'Connor | Nov 6, 06:12 | |
| Kevin Ballard | Nov 6, 08:59 | |
| John Stiles | Nov 6, 17:05 | |
| b.bum | Nov 6, 18:22 | |
| Kevin Ballard | Nov 6, 22:54 | |
| b.bum | Nov 7, 02:37 | |
| Kevin Ballard | Nov 7, 02:42 | |
| b.bum | Nov 7, 03:06 | |
| Kevin Ballard | Nov 7, 04:01 | |
| b.bum | Nov 7, 04:36 | |
| Eric Ocean | Nov 7, 04:44 | |
| Bertrand Mansion | Nov 7, 09:04 | |
| John Siracusa | Nov 7, 14:47 |






Cocoa mail archive

