Skip navigation.
 
mlRe: Regular Expressions?
FROM : James Montgomerie
DATE : Fri Jun 06 20:19:50 2008

On 6 Jun 2008, at 08:03, Jens Alfke wrote:
>
> On 6 Jun '08, at 3:23 AM, Jason Stephenson wrote:
>

>> As a long time UNIX programmer, I'll suggest looking into the 
>> regexp library that already comes with OS X.
>> man regcomp on the command line to find out how to use.

>
> It doesn't look as though this library is Unicode-aware. The strings 
> it takes are C string (char*) with no indication of what encoding is 
> used, and Unicode or UTF-8 aren't mentioned in the man page. From 
> that, I'd guess that this library only works with single-byte 
> encodings (like ISO-Latin-1 or CP-1252, not UTF-8 or the various non-
> Roman encodings) and that it will treat all non-ascii characters as 
> being not spaces and not letters.
>
> In short, I think it only works correctly with plain ascii. IMHO 
> that's much too limited for most purposes nowadays. Even if you 
> don't touch user-visible text with it, it's still pretty common to 
> find non-ascii characters in HTML, XML, even source code.
>
> Of the regex libraries mentioned so far, I recommend RegexKitLite. 
> It's based on ICU, which is Unicode-savvy, already built into the 
> OS, and used by lots of Apple apps.


You are correct, but in my casual usage, feeding UTF-8 to the POSIX 
regex routines works just fine if you take into account that the 
defined character classes are ASCII-aware only, and are aware that the 
results you get back are byte offsets, not character offsets - i.e. 
don't convert them to NSRanges and expect them to be correct against 
the NSString you got the UTF-8 from (similar caveats apply to match 
counts etc. - i.e. "." will happily match two characters if they 
take up three bytes).

I wouldn't want to present the regexes to the user, of course, but for 
pre-defined regexes in code, it's okay (not great with those caveats 
obviously, but alright).

My main complaint about it is that it's /extremely slow/ compared to 
most modern regex libraries, but for casual usage, you at least don't 
have to link any extra libraries to use it.

I do think that good regex additions to NSString, or an NSRegex class, 
are highly overdue in Cocoa.

Jamie.

Related mailsAuthorDate
mlRegular Expressions? Cemil Browne Jun 6, 09:31
mlRe: Regular Expressions? Bob Warwick Jun 6, 09:55
mlRe: Regular Expressions? Bob Warwick Jun 6, 10:01
mlRe: Regular Expressions? David Troy Jun 6, 10:01
mlRe: Regular Expressions? Bill Bumgarner Jun 6, 10:08
mlRe: Regular Expressions? Citizen Jun 6, 10:11
mlRe: Regular Expressions? Cemil Browne Jun 6, 10:22
mlRe: Regular Expressions? Hamish Allan Jun 6, 11:08
mlRe: Regular Expressions? Allison Newman Jun 6, 11:10
mlRe: Regular Expressions? dream cat7 Jun 6, 11:34
mlRe: Regular Expressions? Vincent E. Jun 6, 12:01
mlRe: Regular Expressions? Jason Stephenson Jun 6, 12:23
mlRe: Regular Expressions? dream cat7 Jun 6, 14:09
mlRe: Regular Expressions? Vincent E. Jun 6, 15:39
mlRe: Regular Expressions? Dave DeLong Jun 6, 15:53
mlRe: Regular Expressions? dream cat7 Jun 6, 16:28
mlRe: Regular Expressions? Jason Stephenson Jun 6, 16:45
mlRe: Regular Expressions? Jason Stephenson Jun 6, 17:02
mlRe: Regular Expressions? Jens Alfke Jun 6, 17:03
mlRe: Regular Expressions? glenn andreas Jun 6, 17:13
mlRe: Regular Expressions? Jason Stephenson Jun 6, 17:29
mlRe: Regular Expressions? Jens Alfke Jun 6, 17:41
mlRe: Regular Expressions? dream cat7 Jun 6, 18:10
mlRe: Regular Expressions? David Hoerl Jun 6, 18:16
mlRe: Regular Expressions? Vincent E. Jun 6, 19:27
mlRe: Regular Expressions? James Montgomerie Jun 6, 20:19
mlRe: Regular Expressions? Jens Alfke Jun 6, 23:29
mlRe: Regular Expressions? Stephen J. Butler Jun 6, 23:47
mlRe: Regular Expressions? John C. Randolph Jun 7, 02:19
mlRe: Regular Expressions? Ilan Volow Jun 7, 12:41
ml3rd Party Nonsense (was Re: Regular Expressions?) Jason Stephenson Jun 7, 16:43
mlRe: 3rd Party Nonsense (was Re: Regular Expressions?) Ken Ferry Jun 7, 18:59
mlRe: Regular Expressions? John Engelhart Jun 7, 19:07
mlRe: 3rd Party Nonsense (was Re: Regular Expressions?) Kyle Sluder Jun 7, 19:24
mlRe: 3rd Party Nonsense (was Re: Regular Expressions?) Kevin Grant Jun 7, 19:37
mlRe: 3rd Party Nonsense (was Re: Regular Expressions?) glenn andreas Jun 7, 20:01
mlRe: 3rd Party Nonsense (was Re: Regular Expressions?) Michael Ash Jun 7, 20:27
mlRe: 3rd Party Nonsense (was Re: Regular Expressions?) Torsten Curdt Jun 8, 00:44
mlRe: 3rd Party Nonsense (was Re: Regular Expressions?) Mark Munz Jun 8, 01:19
mlRe: 3rd Party Nonsense (was Re: Regular Expressions?) Michael Ash Jun 8, 12:39
mlRe: 3rd Party Nonsense (was Re: Regular Expressions?) Jens Alfke Jun 10, 05:12
mlRe: 3rd Party Nonsense (was Re: Regular Expressions?) Jens Alfke Jun 10, 05:17
mlRe: 3rd Party Nonsense (was Re: Regular Expressions?) Adam R. Maxwell Jun 10, 05:56
mlRe: 3rd Party Nonsense (was Re: Regular Expressions?) Mark Munz Jun 10, 06:12
mlRe: 3rd Party Nonsense (was Re: Regular Expressions?) Stuart Malin Jun 10, 06:36
mlRe: 3rd Party Nonsense (was Re: Regular Expressions?) Michael Ash Jun 10, 07:38
mlRe: 3rd Party Nonsense (was Re: Regular Expressions?) Chris Ridd Jun 10, 16:16
mlRe: 3rd Party Nonsense (was Re: Regular Expressions?) Jens Alfke Jun 10, 17:20
mlRe: 3rd Party Nonsense (was Re: Regular Expressions?) Michael Ash Jun 10, 18:40
mlRe: 3rd Party Nonsense (was Re: Regular Expressions?) Alastair Houghton Jun 13, 13:36