Skip navigation.
 
mlRe: Regex pattern to find URLs
FROM : b.bum
DATE : Sat Nov 06 18:22:36 2004

On Nov 5, 2004, at 11:59 PM, Kevin Ballard wrote:
> When I throw at it
>
> (http://www.foo.com/foo)test.
>
> it matches
>
> http://www.foo.com/foo)test.
>
> as the URL, which isn't correct. Like I said, a single regex *cannot*
> deal with this situation.


A single regex can handle that situation, it is just a pain to write. 
Using Python's regular expressions as an example (because named
subexpressions are a lot nicer than indice based subexpressions):

>>> import re
>>> r = re.compile('^\((?P<u1>http://[^)]*)|(?P<u2>http://.*)')
>>> r.match('(http://foo.com/baz)bar').group('u1')

'http://foo.com/baz'
>>> r.match('http://foo.com/baz/bar').group('u2')

'http://foo.com/baz/bar'

The '|' -- or operator -- is the key.  Ordering the expressions is
equally as important as you must have the most specific matching
expression first.

Frankly, I would lean to writing multiple expressions and evaluating
them one after the other.  Unless performance is critical at which
point a single expression will be more efficient CPU wise (but may have
significant memory impact as the state map used internally can become
quite large, especially if you are using a unicode capable regex
engine).

b.bum

Related mailsAuthorDate
mlRegex pattern to find URLs Mike O'Connor Nov 6, 04:08
mlRe: Regex pattern to find URLs Joseph Heck Nov 6, 04:13
mlRe: Regex pattern to find URLs Kevin Ballard Nov 6, 04:23
mlRe: Regex pattern to find URLs John Siracusa Nov 6, 05:05
mlRe: Regex pattern to find URLs Mike O'Connor Nov 6, 06:12
mlRe: Regex pattern to find URLs Kevin Ballard Nov 6, 08:59
mlRe: Regex pattern to find URLs John Stiles Nov 6, 17:05
mlRe: Regex pattern to find URLs b.bum Nov 6, 18:22
mlRe: Regex pattern to find URLs Kevin Ballard Nov 6, 22:54
mlRe: Regex pattern to find URLs b.bum Nov 7, 02:37
mlRe: Regex pattern to find URLs Kevin Ballard Nov 7, 02:42
mlRe: Regex pattern to find URLs b.bum Nov 7, 03:06
mlRe: Regex pattern to find URLs Kevin Ballard Nov 7, 04:01
mlRe: Regex pattern to find URLs b.bum Nov 7, 04:36
mlRe: Regex pattern to find URLs Eric Ocean Nov 7, 04:44
mlRe: Regex pattern to find URLs Bertrand Mansion Nov 7, 09:04
mlRe: Regex pattern to find URLs John Siracusa Nov 7, 14:47