[ANN] RegexKit - An Objective-C Framework for Regular Expressions Using the PCRE Library
-
Announcing RegexKit - A framework for regular
expressions using the PCRE library.
At the point in time, this framework is being made
available as an ALPHA test release to solicit
developer comments. While there are no known major
issues with the framework, it is the first time it is
being made available to the public.
The project is hosted at sourceforge, and you can
access it via:
http://regexkit.sourceforge.net/
The framework documentation is available at:
http://regexkit.sourceforge.net/Documentation
The sourceforge project page is:
http://sourceforge.net/projects/regexkit/
While the API documentation is largely complete,
documentation for a final, end-user distribution is
still in flux. There is no sample "double-click and
go" example Xcode project with this release, although
one is planned. Right now I would appreciate feedback
from experienced cocoa developers regarding the API
and usability of the framework.
Some highlights:
o BSD license for the framework. The PCRE library has
a BSD license as well.
o Uses PCRE 7.3, the latest release as of this
posting.
o Mac OS X 10.4 or later required.
o Caches compiled regular expressions for speed.
o Specifically designed to be multithreading safe.
o Designed to be fast and light-weight. Uses the stack
extensively for temporary items and work, rarely
resorts to malloc().
o Not just a wrapper around PCRE, but extends by
category NSArray, NSDictionary, NSSet, and especially
NSString.
o It's documented. No, I'm not kidding, it's actually
documented. Documentation style is the same, familiar
style as Apples documentation.
As an example of that's possible with the NSString
additions, consider the following: Find, extract, and
convert to an unsigned int a hex value from a
NSString. Here's how you can do it with RegexKit:
unsigned int hexValue = 0;
[@"Conversion color: 0xFF0000, Order: 1"
getCapturesWithRegexAndReferences:@"color:
(0x[0-9a-fA-F]+)", @"${1:%x}", &hexValue, NULL];
// hexValue == 16711680 || 0xFF0000
getCapturesWithRegexAndReferences: allows you to
easily match a capture subpattern from a regular
expression and perform a scanf() style conversion.
Also note that there is no need to create a regular
expression object, the framework will automatically
convert NSString objects to RKRegex objects for you.
In fact, getCapturesWithRegexAndReferences: will
accept either an instantiated RKRegex object, or a
NSString which it will convert.
You can also create new strings with references to
regular expression matched text:
NSString *newString = [@"Doe, John"
stringByMatching:@"(?<last>\\S+),\\s*(?<first>\\S+)"
withReferenceString:@"Dear ${first} ${last},\n\nHow
are you today, ${first}?"];
/*
newString ==
Dear John Doe,
How are you today, John?
*/
There are similar search and replace methods as well.
With this alpha release, I'm looking for comments from
objective-c cocoa developers regarding the API. Any
other comments are welcome as well. I'm in the final
push to get a "1.0" general release done, so I'd like
to freeze the features of the framework and
concentrate on getting a polished release out the
door. As I mentioned, there's some conflicting
information in the current release regarding
first-time user related information, such as
references to an old "cli_test*" target/code. I don't
expect any experienced cocoa developer to be thrown
off, though. The documentation regarding adding the
framework to your project (should?) be just fine, just
no examples yet.
____________________________________________________________________________________
Choose the right car based on your needs. Check out Yahoo! Autos new Car Finder tool.
http://autos.yahoo.com/carfinder/ -
On Sep 1, 2007, at 12:49 PM, John Engelhart wrote:> Announcing RegexKit - A framework for regular
> expressions using the PCRE library.
Sweet!
Of course, I still want Apple to get this into the foundation kit, but
this is quite welcome.
-jcr -
On 1 Sep 2007, at 20:49, John Engelhart wrote:> Announcing RegexKit - A framework for regular
> expressions using the PCRE library.
Hi John,
Any chance of changing the name to PCREKit or something more
specific? It's just that there are a number of regexp libraries, all
with subtly different implementations (e.g. POSIX, PCRE, Oniguruma,
ICU...). It can be a bit puzzling at times when confronted with apps
using the various different libraries (and/or with various different
options enabled), so I think it'd be good to make it *really* obvious
to people that your library is using PCRE.
Incidentally, does PCRE have good (i.e. native) support for UTF-16?
Oniguruma and ICU both do (and ICU includes a powerful implementation
of the regex character class feature that lets you query Unicode
attributes), which makes them a good choice for integration with
Cocoa, but if you have to e.g. transcode to UTF-8 in order to use
regex matching, it's going to be somewhat more expensive.
BTW, nice documentation. I was going to ask what tools you used to
do it, but it looks like you included them in the source
distribution. You should consider packaging up the doc. building
tools separately, as it looks like they're an improvement on headerdoc.
On 1 Sep 2007, at 22:20, John C. Randolph wrote:> Sweet!
>
> Of course, I still want Apple to get this into the foundation kit,
> but this is quite welcome.
Indeed. It seems strange that they still haven't done this, given
that they've got ICU working under the covers... ICU includes a
perfectly reasonable (and Unicode-enabled) regular expression engine.
(BTW, it seems likely that an Apple-supplied implementation would be
based on ICU, whose regex engine uses an enhanced Perl-compatible
syntax.)
For those who are interested, there's also
<http://aarone.org/cocoaicu/>
though I don't know how it compares (it probably won't be as
efficient as an Apple implementation could be, as Apple have more
control over the internal representation of strings).
Kind regards,
Alastair.
--
http://alastairs-place.net -
If by the time that 10.5 is released, the mac development community
still doesn't have basic regex support in Foundation, you have my
vote to the change the name of your project to NSRegularExpression.
-- Ilan
On Sep 2, 2007, at 7:12 PM, Alastair Houghton wrote:> On 1 Sep 2007, at 20:49, John Engelhart wrote:
>
>> Announcing RegexKit - A framework for regular
>> expressions using the PCRE library.
>
> Hi John,
>
> Any chance of changing the name to PCREKit or something more
> specific? It's just that there are a number of regexp libraries,
> all with subtly different implementations (e.g. POSIX, PCRE,
> Oniguruma, ICU...). It can be a bit puzzling at times when
> confronted with apps using the various different libraries (and/or
> with various different options enabled), so I think it'd be good to
> make it *really* obvious to people that your library is using PCRE.
>
> Incidentally, does PCRE have good (i.e. native) support for
> UTF-16? Oniguruma and ICU both do (and ICU includes a powerful
> implementation of the regex character class feature that lets you
> query Unicode attributes), which makes them a good choice for
> integration with Cocoa, but if you have to e.g. transcode to UTF-8
> in order to use regex matching, it's going to be somewhat more
> expensive.
>
> BTW, nice documentation. I was going to ask what tools you used to
> do it, but it looks like you included them in the source
> distribution. You should consider packaging up the doc. building
> tools separately, as it looks like they're an improvement on
> headerdoc.
>
> On 1 Sep 2007, at 22:20, John C. Randolph wrote:
>
Ilan Volow
"Implicit code is inherently evil, and here's the reason why:" -
Isn't NSPredicate providing what you're looking for ?
http://developer.apple.com/documentation/Cocoa/Conceptual/Predicates/
Articles/pUsing.html#//apple_ref/doc/uid/TP40001794-DontLinkElementID_15
Thomas
On Sep 5, 2007, at 4:08 PM, Ilan Volow wrote:> If by the time that 10.5 is released, the mac development community
> still doesn't have basic regex support in Foundation, you have my
> vote to the change the name of your project to NSRegularExpression.
>
> -- Ilan -
On 9/3/07, Alastair Houghton <alastair...> wrote:> Incidentally, does PCRE have good (i.e. native) support for UTF-16?
As far as I know, PCRE has only limited support for UTF-8, nothing as
feature-rich as support in ICU or Oniguruma
--
Alexey Zakhlestin
http://blog.milkfarmsoft.com/ -
On 5 sep 2007, at 17.29, Thomas Clément wrote:> Isn't NSPredicate providing what you're looking for ?
Last time I looked, the support for regex in NSPredicate was not all
that great.
I'm all for the addition of a flexible and performant
NSRegularExpression to Foundation. Anyone who agrees should file
enhancement requests with Apple. That's you're best bet of seeing it
happen.
j o a r -
On 5 Sep 2007, at 16:29, Thomas Clément wrote:> Isn't NSPredicate providing what you're looking for ?
> http://developer.apple.com/documentation/Cocoa/Conceptual/
> Predicates/Articles/pUsing.html#//apple_ref/doc/uid/TP40001794-
> DontLinkElementID_15
Good point, but it's still not quite the same thing as having a regex
matcher in Foundation. For instance, I don't think there's a way to
get access to the results of capture groups.
Kind regards,
Alastair.
--
http://alastairs-place.net -
On Sep 5, 2007, at 8:29 AM, Thomas Clément wrote:> Isn't NSPredicate providing what you're looking for ?
An NSComparisonPredicate using NSMatchesPredicateOpertorType can only
tell you whether a whole string matches a specified ICU regular
expression.
It can't get you the matched range of a substring, the number of
subexpressions in the regular expression, or the matched ranges for
individual subexpressions as a full regular expression class would be
able to.
-- Chris -
On Sep 5, 2007, at 8:51 AM, Alexey Zakhlestin wrote:> On 9/3/07, Alastair Houghton <alastair...> wrote:
>
>> Incidentally, does PCRE have good (i.e. native) support for UTF-16?
>
> As far as I know, PCRE has only limited support for UTF-8, nothing as
> feature-rich as support in ICU or Oniguruma
Yeah - I'm definitely an Oniguruma fan. I've wanted to make a
lightweight but useful framework for it that wasn't as bloated as
OgreKit.
--
Seth Willits -
Isn't "NS" reserved for Apple? (So much as the namespace can be
reserved, that is.)
--
m-s
On 05 Sep, 2007, at 10:08, Ilan Volow wrote:> If by the time that 10.5 is released, the mac development community
> still doesn't have basic regex support in Foundation, you have my
> vote to the change the name of your project to NSRegularExpression.
>
> -- Ilan
>
> On Sep 2, 2007, at 7:12 PM, Alastair Houghton wrote:
>
>> On 1 Sep 2007, at 20:49, John Engelhart wrote:
>>
>>> Announcing RegexKit - A framework for regular
>>> expressions using the PCRE library.
>>
>> Hi John,
>>
>> Any chance of changing the name to PCREKit or something more
>> specific? It's just that there are a number of regexp libraries,
>> all with subtly different implementations (e.g. POSIX, PCRE,
>> Oniguruma, ICU...). It can be a bit puzzling at times when
>> confronted with apps using the various different libraries (and/or
>> with various different options enabled), so I think it'd be good
>> to make it *really* obvious to people that your library is using
>> PCRE.
>>
>> Incidentally, does PCRE have good (i.e. native) support for
>> UTF-16? Oniguruma and ICU both do (and ICU includes a powerful
>> implementation of the regex character class feature that lets you
>> query Unicode attributes), which makes them a good choice for
>> integration with Cocoa, but if you have to e.g. transcode to UTF-8
>> in order to use regex matching, it's going to be somewhat more
>> expensive.
>>
>> BTW, nice documentation. I was going to ask what tools you used
>> to do it, but it looks like you included them in the source
>> distribution. You should consider packaging up the doc. building
>> tools separately, as it looks like they're an improvement on
>> headerdoc.
>>
>> On 1 Sep 2007, at 22:20, John C. Randolph wrote:
>>
>
> Ilan Volow
> "Implicit code is inherently evil, and here's the reason why:" -
I think this is just an expression of frustration that Apple hasn't
provided a standard Regex framework for Cocoa, despite using ICU since
Tiger.
Using ICU and building against the dylib is non-trivial -- and Apple
apparently doesn't feel addressing this issue is more important than
animating the views as they slide around the screen.
On 9/6/07, Michael Watson <mikey-san...> wrote:> Isn't "NS" reserved for Apple? (So much as the namespace can be
> reserved, that is.)
>
>
>
> --
> m-s
>
> On 05 Sep, 2007, at 10:08, Ilan Volow wrote:
>
>> If by the time that 10.5 is released, the mac development community
>> still doesn't have basic regex support in Foundation, you have my
>> vote to the change the name of your project to NSRegularExpression.
>>
>> -- Ilan
>>
>> On Sep 2, 2007, at 7:12 PM, Alastair Houghton wrote:
>>
>>> On 1 Sep 2007, at 20:49, John Engelhart wrote:
>>>
>>>> Announcing RegexKit - A framework for regular
>>>> expressions using the PCRE library.
>>>
>>> Hi John,
>>>
>>> Any chance of changing the name to PCREKit or something more
>>> specific? It's just that there are a number of regexp libraries,
>>> all with subtly different implementations (e.g. POSIX, PCRE,
>>> Oniguruma, ICU...). It can be a bit puzzling at times when
>>> confronted with apps using the various different libraries (and/or
>>> with various different options enabled), so I think it'd be good
>>> to make it *really* obvious to people that your library is using
>>> PCRE.
>>>
>>> Incidentally, does PCRE have good (i.e. native) support for
>>> UTF-16? Oniguruma and ICU both do (and ICU includes a powerful
>>> implementation of the regex character class feature that lets you
>>> query Unicode attributes), which makes them a good choice for
>>> integration with Cocoa, but if you have to e.g. transcode to UTF-8
>>> in order to use regex matching, it's going to be somewhat more
>>> expensive.
>>>
>>> BTW, nice documentation. I was going to ask what tools you used
>>> to do it, but it looks like you included them in the source
>>> distribution. You should consider packaging up the doc. building
>>> tools separately, as it looks like they're an improvement on
>>> headerdoc.
>>>
>>> On 1 Sep 2007, at 22:20, John C. Randolph wrote:
>>>
>>
>> Ilan Volow
>> "Implicit code is inherently evil, and here's the reason why:"
>
--
Mark Munz
unmarked software
http://www.unmarked.com/ -
On Sep 6, 2007, at 7:05 AM, Mark Munz wrote:> and Apple
> apparently doesn't feel addressing this issue is more important than
> animating the views as they slide around the screen.
Yeah also why did they waste time adding 64b support when the could
have been adding something a developer could have picked up
themselves if they really needed it.
-Shawn -
Come on. 64b support is different than flashy animations and you know it.
Apple has a habit of making every developer repeat the same basic work
that should be part of the system. Even with ICU included, you still
have to jump through a number of hoops to use it with NSStrings. They
know it (based on a conversation I had in 2006 at WWDC).
Rather than fix the problem once, they basically force each developer
to "fix" the problem. And this is a pattern that is decades old.
On 9/6/07, Shawn Erickson <shawnce...> wrote:>
> On Sep 6, 2007, at 7:05 AM, Mark Munz wrote:
>
>> and Apple
>> apparently doesn't feel addressing this issue is more important than
>> animating the views as they slide around the screen.
>
> Yeah also why did they waste time adding 64b support when the could
> have been adding something a developer could have picked up
> themselves if they really needed it.
>
> -Shawn
>
--
Mark Munz
unmarked software
http://www.unmarked.com/ -
We definitely want this to happen, but it's a lot of work. This is
the current roadblock:
http://bugs.icu-project.org/trac/ticket/4521
Please note the number of weeks estimated for implementation.
Sorry for the long wait...
Deborah Goldsmith
Internationalization, Unicode Liaison
Apple Inc.
<goldsmit...>
On Sep 6, 2007, at 7:32 AM, Mark Munz wrote:> Come on. 64b support is different than flashy animations and you
> know it.
>
> Apple has a habit of making every developer repeat the same basic work
> that should be part of the system. Even with ICU included, you still
> have to jump through a number of hoops to use it with NSStrings. They
> know it (based on a conversation I had in 2006 at WWDC).
>
> Rather than fix the problem once, they basically force each developer
> to "fix" the problem. And this is a pattern that is decades old.
>
> On 9/6/07, Shawn Erickson <shawnce...> wrote:
>>
>> On Sep 6, 2007, at 7:05 AM, Mark Munz wrote:
>>
>>> and Apple
>>> apparently doesn't feel addressing this issue is more important than
>>> animating the views as they slide around the screen.
>>
>> Yeah also why did they waste time adding 64b support when the could
>> have been adding something a developer could have picked up
>> themselves if they really needed it.
>>
>> -Shawn
>>
>
>
> --
> Mark Munz
> unmarked software
> http://www.unmarked.com/ -
Hmm. Well, that's a reasonable obstacle. But still, it seems as if
Apple could have implemented it the "slow" way for now (where it
converts each string to a buffer before regex'ing it), and then once
this bug gets implemented, convert the OS to use the "fast" way. That
way the API could internally be improved over time while the
interface stays the same.
Or is there another reason why it couldn't be done this way...?
Also, FWIW, there are plenty of internal Apple projects which already
mix regular expressions and NSStringsâXcode comes to mindâso again,
it doesn't seem like a total blocker. Just maybe a short-term perf
issue.
On Sep 10, 2007, at 7:09 PM, Deborah Goldsmith wrote:> We definitely want this to happen, but it's a lot of work. This is
> the current roadblock:
>
> http://bugs.icu-project.org/trac/ticket/4521
>
> Please note the number of weeks estimated for implementation.
>
> Sorry for the long wait...
>
> Deborah Goldsmith
> Internationalization, Unicode Liaison
> Apple Inc.
> <goldsmit...>
>
> On Sep 6, 2007, at 7:32 AM, Mark Munz wrote:
>
>> Come on. 64b support is different than flashy animations and you
>> know it.
>>
>> Apple has a habit of making every developer repeat the same basic
>> work
>> that should be part of the system. Even with ICU included, you still
>> have to jump through a number of hoops to use it with NSStrings. They
>> know it (based on a conversation I had in 2006 at WWDC).
>>
>> Rather than fix the problem once, they basically force each developer
>> to "fix" the problem. And this is a pattern that is decades old.
>>
>> On 9/6/07, Shawn Erickson <shawnce...> wrote:
>>>
>>> On Sep 6, 2007, at 7:05 AM, Mark Munz wrote:
>>>
>>>> and Apple
>>>> apparently doesn't feel addressing this issue is more important
>>>> than
>>>> animating the views as they slide around the screen.
>>>
>>> Yeah also why did they waste time adding 64b support when the could
>>> have been adding something a developer could have picked up
>>> themselves if they really needed it.
>>>
>>> -Shawn
>>>
>>
>>
>> --
>> Mark Munz
>> unmarked software
>> http://www.unmarked.com/ -
> We definitely want this to happen, but it's a lot of work. This is
> the current roadblock:
>
> http://bugs.icu-project.org/trac/ticket/4521
While it may be a lot of work, but it also looks like the issue has
pushed off again and again as the bug has been open for two years.
It definitely seems like great is the enemy of good here.
Also, isn't Xcode already using ICU for its regular expression searches?
--
Mark Munz
unmarked software
http://www.unmarked.com/ -
On Sep 6, 2007, at 7:05 AM, Mark Munz wrote:> Using ICU and building against the dylib is non-trivial -- and Apple
> apparently doesn't feel addressing this issue is more important than
> animating the views as they slide around the screen.
Think for a moment: would you expect that the people who are best
able to implement an NSRegex class are the same ones who implemented
CoreAnimation?
-jcr -
On 11 Sep 2007, at 03:46, John Stiles wrote:> Hmm. Well, that's a reasonable obstacle. But still, it seems as if
> Apple could have implemented it the "slow" way for now (where it
> converts each string to a buffer before regex'ing it),
Actually you don't need to do that in all cases. You can use
CFStringGetCharactersPtr() to try to get a UTF-16 character pointer
for the string, and then fall back to allocating a buffer if/when
that fails.
Unfortunately I have a suspicion that the function always fails for
string constants...
On 11 Sep 2007, at 04:16, Mark Munz wrote:>> We definitely want this to happen, but it's a lot of work. This is
>> the current roadblock:
>>
>> http://bugs.icu-project.org/trac/ticket/4521
>
> While it may be a lot of work, but it also looks like the issue has
> pushed off again and again as the bug has been open for two years.
Since anyone is free to contribute to ICU, if you particularly care
about it, you could offer your assistance to fix it. Indeed, any of
us could have any time over the past two years. It looks from the
ticket though that IBM, Apple and Google all have people working on
the problem right now---at least, I'm guessing that that's what the
"load" property means.> It definitely seems like great is the enemy of good here.
I think that viewpoint is hard to defend.
It's always been possible for developers to use regexps in their own
code, whether by using the POSIX functions, or by linking in a third
party library (Oniguruma probably being the best match for Cocoa
strings right now as it has native UTF-16 support). Is it fair,
therefore, to criticise Apple for electing to wait until the ICU
implementation had all the features they felt they needed for really
great regexp support? I don't think so. It isn't exactly a huge
effort using a third-party library for now.
Put another way, since we have plenty of options right now, why rush
to make an official "NSRegularExpression" (or CFRegularExpression...)
before there's a significant advantage in doing so?
I would also guess that the regexp issue has had more bugs filed
against it in recent months because it seems that more people are
coming from scripting languages where regexps are widely available,
as opposed to C where they're really only available on Unix-like
systems (or with a third-party library...).
Kind regards,
Alastair.
--
http://alastairs-place.net -
If I ignore the "Beware of Leopard" of signs for just a moment, can
we reasonably expect NSRegularExpression (or something equivalent) in
10.5?
-- Ilan
On Sep 10, 2007, at 10:09 PM, Deborah Goldsmith wrote:> We definitely want this to happen, but it's a lot of work. This is
> the current roadblock:
>
> http://bugs.icu-project.org/trac/ticket/4521
>
> Please note the number of weeks estimated for implementation.
>
> Sorry for the long wait...
>
> Deborah Goldsmith
> Internationalization, Unicode Liaison
> Apple Inc.
> <goldsmit...>
>
> On Sep 6, 2007, at 7:32 AM, Mark Munz wrote:
>
>> Come on. 64b support is different than flashy animations and you
>> know it.
>>
>> Apple has a habit of making every developer repeat the same basic
>> work
>> that should be part of the system. Even with ICU included, you still
>> have to jump through a number of hoops to use it with NSStrings. They
>> know it (based on a conversation I had in 2006 at WWDC).
>>
>> Rather than fix the problem once, they basically force each developer
>> to "fix" the problem. And this is a pattern that is decades old.
>>
>> On 9/6/07, Shawn Erickson <shawnce...> wrote:
>>>
>>> On Sep 6, 2007, at 7:05 AM, Mark Munz wrote:
>>>
>>>> and Apple
>>>> apparently doesn't feel addressing this issue is more important
>>>> than
>>>> animating the views as they slide around the screen.
>>>
>>> Yeah also why did they waste time adding 64b support when the could
>>> have been adding something a developer could have picked up
>>> themselves if they really needed it.
>>>
>>> -Shawn
>>>
>>
>>
>> --
>> Mark Munz
>> unmarked software
>> http://www.unmarked.com/
Ilan Volow
"Implicit code is inherently evil, and here's the reason why:" -
> Think for a moment: would you expect that the people who are best
> able to implement an NSRegex class are the same ones who implemented
> CoreAnimation?
The point of the comment was about priority, not about the specific
engineers working on it. This stuff is already implemented in existing
Apple code (Xcode).
--
Mark Munz
unmarked software
http://www.unmarked.com/ -
>> It definitely seems like great is the enemy of good here.
>
> I think that viewpoint is hard to defend.
>
> It's always been possible for developers to use regexps in their own
> code, whether by using the POSIX functions, or by linking in a third
> party library (Oniguruma probably being the best match for Cocoa
> strings right now as it has native UTF-16 support). Is it fair,
> therefore, to criticise Apple for electing to wait until the ICU
> implementation had all the features they felt they needed for really
> great regexp support? I don't think so. It isn't exactly a huge
> effort using a third-party library for now.
I think you just made my argument for me. Apple is waiting for all the
features they felt they needed for really great regexp support. (the
bug description seemed a bit vague on what it was they were waiting
for, more like one of those high-level "we need some APIs" bugs)> Put another way, since we have plenty of options right now, why rush
> to make an official "NSRegularExpression" (or CFRegularExpression...)
> before there's a significant advantage in doing so?
That argument could be made for virtually anything Apple does or
doesn't do. There are existing options for animation, advanced
controls, data management. Those can all be done right now, but the
new APIs make the whole task easier. That's all I'm asking for, make
the task of using regexp easier (especially given how commonplace it
now is in everything but Cocoa).
I'm just beginning to look at CocoaICU (first heard of it in this
thread), but if one guy can do that -- certainly Apple could do
something similar.
When I spoke to some folks at Apple at WWDC 2006, I was told that ICU
was the future direction. That's great! I just would like to see that
future before 2010! Unfortunately, I think that is exactly what we'll
have to wait for.
(fyi -- 2010 is just a guess as to when 10.6 might show up)> I would also guess that the regexp issue has had more bugs filed
> against it in recent months because it seems that more people are
> coming from scripting languages where regexps are widely available,
> as opposed to C where they're really only available on Unix-like
> systems (or with a third-party library...).
Very true. In fact, I think Obj-c/c++ developers are the only group
with a high level framework that doesn't have built-in support for
regexp.
--
Mark Munz
unmarked software
http://www.unmarked.com/ -
On Sep 11, 2007, at 9:39 AM, Mark Munz wrote:> Very true. In fact, I think Obj-c/c++ developers are the only group
> with a high level framework that doesn't have built-in support for
> regexp.
Actually, I am pretty sure that TR1 includes regular expressions. -
On 11 Sep 2007, at 17:39, Mark Munz wrote:> I think you just made my argument for me. Apple is waiting for all the
> features they felt they needed for really great regexp support. (the
> bug description seemed a bit vague on what it was they were waiting
> for, more like one of those high-level "we need some APIs" bugs)
I believe the bug Deborah pointed at is basically talking about
changing the ICU regexp engine to be character-encoding agnostic (so
you can compile a regexp once and then use it to match against
strings in UTF-8, UTF-16 or perhaps text in other coding systems) and
to support matching against buffers that are spread across multiple
memory regions. (Currently it only supports native-endian UTF-16,
and to do a really good job of regexp integration with CF/NSString,
you'd want to be able to support 8-bit encodings and ideally---for
NSTextStorage---multi-chunk storage as well.)
I could be wrong, but I think that's what it's all about.>> Put another way, since we have plenty of options right now, why rush
>> to make an official "NSRegularExpression" (or CFRegularExpression...)
>> before there's a significant advantage in doing so?
>
> That argument could be made for virtually anything Apple does or
> doesn't do. There are existing options for animation, advanced
> controls, data management. Those can all be done right now, but the
> new APIs make the whole task easier.
Well... there are plenty of things that *only* Apple can do, and even
more that only Apple can do without risking the use of unsupported APIs.> That's all I'm asking for, make the task of using regexp easier
> (especially given how commonplace it now is in everything but Cocoa).
Sure, though it isn't exactly hard right now. It just requires a
little bit of effort on the part of developers, and there are several
Cocoa regexp frameworks available (including RegexKit, which I still
wish had a different name :-)).> I'm just beginning to look at CocoaICU (first heard of it in this
> thread), but if one guy can do that -- certainly Apple could do
> something similar.
There's OGREKit as well:
http://www8.ocn.ne.jp/~sonoisa/OgreKit/
Personally, where I've needed regexps in Cocoa apps, I've just used
Oniguruma's C API together with a little bit of CoreFoundation; it's
quite easy, which is why I'm not hugely concerned about having this
feature as part of Cocoa just yet (or rather, why I'd rather Apple
did the changes mentioned above before adding it... that way there
will be really obvious advantages to using Apple's NSRegularExpression).
I do understand where those people asking for this feature are coming
from, but I also think it would be a mistake to rush a solution and
then have to deprecate APIs almost immediately.
Kind regards,
Alastair.
--
http://alastairs-place.net -
Well, I think the point of APIs is kind of the opposite of what you're
talking about, though. You can invent an API and ship it, even if
version 1.0 is maybe not perfect from a performance perspective, and
then in version 2.0, you can keep the same interface to the outside
world but improve the guts so that they run faster or use less memory or
whatever.
Alastair Houghton wrote:> On 11 Sep 2007, at 17:39, Mark Munz wrote:
>
>> I think you just made my argument for me. Apple is waiting for all the
>> features they felt they needed for really great regexp support. (the
>> bug description seemed a bit vague on what it was they were waiting
>> for, more like one of those high-level "we need some APIs" bugs)
>
>
> I believe the bug Deborah pointed at is basically talking about
> changing the ICU regexp engine to be character-encoding agnostic (so
> you can compile a regexp once and then use it to match against
> strings in UTF-8, UTF-16 or perhaps text in other coding systems) and
> to support matching against buffers that are spread across multiple
> memory regions. (Currently it only supports native-endian UTF-16,
> and to do a really good job of regexp integration with CF/NSString,
> you'd want to be able to support 8-bit encodings and ideally---for
> NSTextStorage---multi-chunk storage as well.)
>
> I could be wrong, but I think that's what it's all about.
>
>>> Put another way, since we have plenty of options right now, why rush
>>> to make an official "NSRegularExpression" (or CFRegularExpression...)
>>> before there's a significant advantage in doing so?
>>
>>
>> That argument could be made for virtually anything Apple does or
>> doesn't do. There are existing options for animation, advanced
>> controls, data management. Those can all be done right now, but the
>> new APIs make the whole task easier.
>
>
> Well... there are plenty of things that *only* Apple can do, and even
> more that only Apple can do without risking the use of unsupported APIs.
>
>> That's all I'm asking for, make the task of using regexp easier
>> (especially given how commonplace it now is in everything but Cocoa).
>
>
> Sure, though it isn't exactly hard right now. It just requires a
> little bit of effort on the part of developers, and there are several
> Cocoa regexp frameworks available (including RegexKit, which I still
> wish had a different name :-)).
>
>> I'm just beginning to look at CocoaICU (first heard of it in this
>> thread), but if one guy can do that -- certainly Apple could do
>> something similar.
>
>
> There's OGREKit as well:
>
> http://www8.ocn.ne.jp/~sonoisa/OgreKit/
>
> Personally, where I've needed regexps in Cocoa apps, I've just used
> Oniguruma's C API together with a little bit of CoreFoundation; it's
> quite easy, which is why I'm not hugely concerned about having this
> feature as part of Cocoa just yet (or rather, why I'd rather Apple
> did the changes mentioned above before adding it... that way there
> will be really obvious advantages to using Apple's NSRegularExpression).
>
> I do understand where those people asking for this feature are coming
> from, but I also think it would be a mistake to rush a solution and
> then have to deprecate APIs almost immediately.
>
> Kind regards,
>
> Alastair.
>
> --
> http://alastairs-place.net
>
> -
On 11 Sep 2007, at 18:05, John Stiles wrote:> Well, I think the point of APIs is kind of the opposite of what
> you're talking about, though. You can invent an API and ship it,
> even if version 1.0 is maybe not perfect from a performance
> perspective, and then in version 2.0, you can keep the same
> interface to the outside world but improve the guts so that they
> run faster or use less memory or whatever.
That's all true, but why release an implementation at all if there
are third-party alternatives that are just as simple to use and if
your implementation has no performance or other advantages?
Particularly if there's a risk (and I think there is) that the API
you come up with might have to change because of major architectural
work going on on the underlying ICU code.
In such a situation, I'd rather they didn't rush. Apparently Mark
Munz feels differently, but I don't think we're going to achieve
anything useful by debating the matter further.
Right now there isn't an NSRegularExpression, and it looks like
there's more work necessary before we get one. So we can use one of
various third-party frameworks, or directly use one of various third-
party libraries.
Kind regards,
Alastair.
--
http://alastairs-place.net -
The reason to ship it as an official API is that if you bundle in some
third-party solution, it won't improve with time. If Apple does it, they
can improve it in Leopard or 10.6, and all apps that adopted the
official API suddenly get the benefits.
Anyway, bundling in third-party libraries has plenty of downsides. If
ten apps do it, you've got the same chunk of code loaded ten times in
RAM instead of once. It takes more disk space and slows down app loading
times as well. Not all developers will keep their apps up-to-date with
the latest version of the third-party library, so security
vulnerabilities and perf issues from older versions of the third-party
code will always be an issue. All minor concerns when expressed
individually, sure, but these are the reasons we have official frameworks.
Alastair Houghton wrote:> On 11 Sep 2007, at 18:05, John Stiles wrote:
>
>> Well, I think the point of APIs is kind of the opposite of what
>> you're talking about, though. You can invent an API and ship it,
>> even if version 1.0 is maybe not perfect from a performance
>> perspective, and then in version 2.0, you can keep the same
>> interface to the outside world but improve the guts so that they run
>> faster or use less memory or whatever.
>
>
> That's all true, but why release an implementation at all if there
> are third-party alternatives that are just as simple to use and if
> your implementation has no performance or other advantages?
> Particularly if there's a risk (and I think there is) that the API
> you come up with might have to change because of major architectural
> work going on on the underlying ICU code.
>
> In such a situation, I'd rather they didn't rush. Apparently Mark
> Munz feels differently, but I don't think we're going to achieve
> anything useful by debating the matter further.
>
> Right now there isn't an NSRegularExpression, and it looks like
> there's more work necessary before we get one. So we can use one of
> various third-party frameworks, or directly use one of various third-
> party libraries.
>
> Kind regards,
>
> Alastair.
>
> --
> http://alastairs-place.net
>
> -
On 9/11/07, Alastair Houghton <alastair...> wrote:> That's all true, but why release an implementation at all if there
> are third-party alternatives that are just as simple to use and if
> your implementation has no performance or other advantages?
There are some 3rd party options, but they come with their own set of
issues. Just because I want to use regexps does not mean I want to be
forced to become an expert in the underlying engine code.> Particularly if there's a risk (and I think there is) that the API
> you come up with might have to change because of major architectural
> work going on on the underlying ICU code.
Are you just assuming this might be the case or are there actual major
architectural changes taking place in the underlying ICU code with the
next release of the library?> In such a situation, I'd rather they didn't rush. Apparently Mark
> Munz feels differently, but I don't think we're going to achieve
> anything useful by debating the matter further.
Rushing? Mind you, I'm pretty sure it's not going to be in Leopard --
so we're already looking at Leopard + 1, which means at least 2010,
maybe 2011 before support *might* show up. The issue is *at least* 2
years old. And we're assuming it doesn't get pushed back yet again (as
it has for the last 2 years).> Right now there isn't an NSRegularExpression, and it looks like
> there's more work necessary before we get one. So we can use one of
> various third-party frameworks, or directly use one of various third-
> party libraries.
Yes, the same argument can be made for features like Source lists and
Media Browsing. Just because there is a work around doesn't mean there
shouldn't be a solution included in the current Cocoa frameworks.
While there are 3rd party options, they aren't all consistent in their
syntax and options. So any regexp exposed to the end user is subject
to inconsistent behavior from their point of view. Having a common
library supported by the OS helps eliminate that issue.
--
Mark Munz
unmarked software
http://www.unmarked.com/ -
Although this subject may be very interesting to some readers, with well
over 4000 subscribers to cocoa-dev, it's important we keep the subjects on
topic.
I encourage you to file an enhancement request through
bugreporter.apple.com, or please take the discussion off-line.
Thanks,
Cocoa-Dev Admins> On 9/11/07, Alastair Houghton <alastair...> wrote:
>
>> That's all true, but why release an implementation at all if there
>> are third-party alternatives that are just as simple to use and if
>> your implementation has no performance or other advantages?
>
> There are some 3rd party options, but they come with their own set of
> issues. Just because I want to use regexps does not mean I want to be
> forced to become an expert in the underlying engine code.
>
>> Particularly if there's a risk (and I think there is) that the API
>> you come up with might have to change because of major architectural
>> work going on on the underlying ICU code.
>
> Are you just assuming this might be the case or are there actual major
> architectural changes taking place in the underlying ICU code with the
> next release of the library?
>
>> In such a situation, I'd rather they didn't rush. Apparently Mark
>> Munz feels differently, but I don't think we're going to achieve
>> anything useful by debating the matter further.
>
> Rushing? Mind you, I'm pretty sure it's not going to be in Leopard --
> so we're already looking at Leopard + 1, which means at least 2010,
> maybe 2011 before support *might* show up. The issue is *at least* 2
> years old. And we're assuming it doesn't get pushed back yet again (as
> it has for the last 2 years).
>
>> Right now there isn't an NSRegularExpression, and it looks like
>> there's more work necessary before we get one. So we can use one of
>> various third-party frameworks, or directly use one of various third-
>> party libraries.
>
> Yes, the same argument can be made for features like Source lists and
> Media Browsing. Just because there is a work around doesn't mean there
> shouldn't be a solution included in the current Cocoa frameworks.
>
> While there are 3rd party options, they aren't all consistent in their
> syntax and options. So any regexp exposed to the end user is subject
> to inconsistent behavior from their point of view. Having a common
> library supported by the OS helps eliminate that issue. -
Although this subject may be very interesting to some readers, with well
over 4000 subscribers to cocoa-dev, it's important we keep the subjects on
topic.
I encourage you to file an enhancement request through
bugreporter.apple.com, or please take the discussion off-line.
Thanks,
Cocoa-Dev Admins> On 9/11/07, Alastair Houghton <alastair...> wrote:
>
>> That's all true, but why release an implementation at all if there
>> are third-party alternatives that are just as simple to use and if
>> your implementation has no performance or other advantages?
>
> There are some 3rd party options, but they come with their own set of
> issues. Just because I want to use regexps does not mean I want to be
> forced to become an expert in the underlying engine code.
>
>> Particularly if there's a risk (and I think there is) that the API
>> you come up with might have to change because of major architectural
>> work going on on the underlying ICU code.
>
> Are you just assuming this might be the case or are there actual major
> architectural changes taking place in the underlying ICU code with the
> next release of the library?
>
>> In such a situation, I'd rather they didn't rush. Apparently Mark
>> Munz feels differently, but I don't think we're going to achieve
>> anything useful by debating the matter further.
>
> Rushing? Mind you, I'm pretty sure it's not going to be in Leopard --
> so we're already looking at Leopard + 1, which means at least 2010,
> maybe 2011 before support *might* show up. The issue is *at least* 2
> years old. And we're assuming it doesn't get pushed back yet again (as
> it has for the last 2 years).
>
>> Right now there isn't an NSRegularExpression, and it looks like
>> there's more work necessary before we get one. So we can use one of
>> various third-party frameworks, or directly use one of various third-
>> party libraries.
>
> Yes, the same argument can be made for features like Source lists and
> Media Browsing. Just because there is a work around doesn't mean there
> shouldn't be a solution included in the current Cocoa frameworks.
>
> While there are 3rd party options, they aren't all consistent in their
> syntax and options. So any regexp exposed to the end user is subject
> to inconsistent behavior from their point of view. Having a common
> library supported by the OS helps eliminate that issue. -
On 11.09.2007, at 18:42, John Stiles wrote:> On Sep 11, 2007, at 9:39 AM, Mark Munz wrote:
>
>> Very true. In fact, I think Obj-c/c++ developers are the only group
>> with a high level framework that doesn't have built-in support for
>> regexp.
>
> Actually, I am pretty sure that TR1 includes regular expressions.
> _______________________________________________
They are. But not implemented as of the current g++ std:: library.
<http://gcc.gnu.org/onlinedocs/libstdc++/ext/tr1.html>


