[ANN] RegexKit - An Objective-C Framework for Regular Expressions Using the PCRE Library

  • Announcing RegexKit - A framework for regular
    expressions using the PCRE library.

    At the point in time, this framework is being made
    available as an ALPHA test release to solicit
    developer comments. While there are no known major
    issues with the framework, it is the first time it is
    being made available to the public.

    The project is hosted at sourceforge, and you can
    access it via:

    http://regexkit.sourceforge.net/

    The framework documentation is available at:

    http://regexkit.sourceforge.net/Documentation

    The sourceforge project page is:

    http://sourceforge.net/projects/regexkit/

    While the API documentation is largely complete,
    documentation for a final, end-user distribution is
    still in flux. There is no sample "double-click and
    go" example Xcode project with this release, although
    one is planned. Right now I would appreciate feedback
    from experienced cocoa developers regarding the API
    and usability of the framework.

    Some highlights:

    o BSD license for the framework. The PCRE library has
    a BSD license as well.
    o Uses PCRE 7.3, the latest release as of this
    posting.
    o Mac OS X 10.4 or later required.
    o Caches compiled regular expressions for speed.
    o Specifically designed to be multithreading safe.
    o Designed to be fast and light-weight. Uses the stack
    extensively for temporary items and work, rarely
    resorts to malloc().
    o Not just a wrapper around PCRE, but extends by
    category NSArray, NSDictionary, NSSet, and especially
    NSString.
    o It's documented. No, I'm not kidding, it's actually
    documented. Documentation style is the same, familiar
    style as Apples documentation.

    As an example of that's possible with the NSString
    additions, consider the following: Find, extract, and
    convert to an unsigned int a hex value from a
    NSString. Here's how you can do it with RegexKit:

    unsigned int hexValue = 0;

    [@"Conversion color: 0xFF0000, Order: 1"
    getCapturesWithRegexAndReferences:@"color:
    (0x[0-9a-fA-F]+)", @"${1:%x}", &hexValue, NULL];
    // hexValue == 16711680 || 0xFF0000

    getCapturesWithRegexAndReferences: allows you to
    easily match a capture subpattern from a regular
    expression and perform a scanf() style conversion.
    Also note that there is no need to create a regular
    expression object, the framework will automatically
    convert NSString objects to RKRegex objects for you.
    In fact, getCapturesWithRegexAndReferences: will
    accept either an instantiated RKRegex object, or a
    NSString which it will convert.

    You can also create new strings with references to
    regular expression matched text:

    NSString *newString = [@"Doe, John"
    stringByMatching:@"(?<last>\\S+),\\s*(?<first>\\S+)"
    withReferenceString:@"Dear ${first} ${last},\n\nHow
    are you today, ${first}?"];
    /*
    newString ==
    Dear John Doe,

    How are you today, John?
    */

    There are similar search and replace methods as well.

    With this alpha release, I'm looking for comments from
    objective-c cocoa developers regarding the API. Any
    other comments are welcome as well. I'm in the final
    push to get a "1.0" general release done, so I'd like
    to freeze the features of the framework and
    concentrate on getting a polished release out the
    door. As I mentioned, there's some conflicting
    information in the current release regarding
    first-time user related information, such as
    references to an old "cli_test*" target/code. I don't
    expect any experienced cocoa developer to be thrown
    off, though. The documentation regarding adding the
    framework to your project (should?) be just fine, just
    no examples yet.


    ____________________________________________________________________________________
    Choose the right car based on your needs.  Check out Yahoo! Autos new Car Finder tool.
    http://autos.yahoo.com/carfinder/
  • On Sep 1, 2007, at 12:49 PM, John Engelhart wrote:

    > Announcing RegexKit - A framework for regular
    > expressions using the PCRE library.

    Sweet!

    Of course, I still want Apple to get this into the foundation kit, but
    this is quite welcome.

    -jcr
  • On 1 Sep 2007, at 20:49, John Engelhart wrote:

    > Announcing RegexKit - A framework for regular
    > expressions using the PCRE library.

    Hi John,

    Any chance of changing the name to PCREKit or something more
    specific?  It's just that there are a number of regexp libraries, all
    with subtly different implementations (e.g. POSIX, PCRE, Oniguruma,
    ICU...).  It can be a bit puzzling at times when confronted with apps
    using the various different libraries (and/or with various different
    options enabled), so I think it'd be good to make it *really* obvious
    to people that your library is using PCRE.

    Incidentally, does PCRE have good (i.e. native) support for UTF-16?
    Oniguruma and ICU both do (and ICU includes a powerful implementation
    of the regex character class feature that lets you query Unicode
    attributes), which makes them a good choice for integration with
    Cocoa, but if you have to e.g. transcode to UTF-8 in order to use
    regex matching, it's going to be somewhat more expensive.

    BTW, nice documentation.  I was going to ask what tools you used to
    do it, but it looks like you included them in the source
    distribution.  You should consider packaging up the doc. building
    tools separately, as it looks like they're an improvement on headerdoc.

    On 1 Sep 2007, at 22:20, John C. Randolph wrote:

    > Sweet!
    >
    > Of course, I still want Apple to get this into the foundation kit,
    > but this is quite welcome.

    Indeed.  It seems strange that they still haven't done this, given
    that they've got ICU working under the covers... ICU includes a
    perfectly reasonable (and Unicode-enabled) regular expression engine.

    (BTW, it seems likely that an Apple-supplied implementation would be
    based on ICU, whose regex engine uses an enhanced Perl-compatible
    syntax.)

    For those who are interested, there's also

      <http://aarone.org/cocoaicu/>

    though I don't know how it compares (it probably won't be as
    efficient as an Apple implementation could be, as Apple have more
    control over the internal representation of strings).

    Kind regards,

    Alastair.

    --
    http://alastairs-place.net
  • If by the time that 10.5 is released, the mac development community
    still doesn't have basic regex support in Foundation, you have my
    vote to the change the name of your project to NSRegularExpression.

    -- Ilan

    On Sep 2, 2007, at 7:12 PM, Alastair Houghton wrote:

    > On 1 Sep 2007, at 20:49, John Engelhart wrote:
    >
    >> Announcing RegexKit - A framework for regular
    >> expressions using the PCRE library.
    >
    > Hi John,
    >
    > Any chance of changing the name to PCREKit or something more
    > specific?  It's just that there are a number of regexp libraries,
    > all with subtly different implementations (e.g. POSIX, PCRE,
    > Oniguruma, ICU...).  It can be a bit puzzling at times when
    > confronted with apps using the various different libraries (and/or
    > with various different options enabled), so I think it'd be good to
    > make it *really* obvious to people that your library is using PCRE.
    >
    > Incidentally, does PCRE have good (i.e. native) support for
    > UTF-16?  Oniguruma and ICU both do (and ICU includes a powerful
    > implementation of the regex character class feature that lets you
    > query Unicode attributes), which makes them a good choice for
    > integration with Cocoa, but if you have to e.g. transcode to UTF-8
    > in order to use regex matching, it's going to be somewhat more
    > expensive.
    >
    > BTW, nice documentation.  I was going to ask what tools you used to
    > do it, but it looks like you included them in the source
    > distribution.  You should consider packaging up the doc. building
    > tools separately, as it looks like they're an improvement on
    > headerdoc.
    >
    > On 1 Sep 2007, at 22:20, John C. Randolph wrote:
    >

    Ilan Volow
    "Implicit code is inherently evil, and here's the reason why:"
  • Isn't NSPredicate providing what you're looking for ?
    http://developer.apple.com/documentation/Cocoa/Conceptual/Predicates/
    Articles/pUsing.html#//apple_ref/doc/uid/TP40001794-DontLinkElementID_15

    Thomas

    On Sep 5, 2007, at 4:08 PM, Ilan Volow wrote:

    > If by the time that 10.5 is released, the mac development community
    > still doesn't have basic regex support in Foundation, you have my
    > vote to the change the name of your project to NSRegularExpression.
    >
    > -- Ilan
  • On 9/3/07, Alastair Houghton <alastair...> wrote:

    > Incidentally, does PCRE have good (i.e. native) support for UTF-16?

    As far as I know, PCRE has only limited support for UTF-8, nothing as
    feature-rich as support in ICU or Oniguruma

    --
    Alexey Zakhlestin
    http://blog.milkfarmsoft.com/
  • On 5 sep 2007, at 17.29, Thomas Clément wrote:

    > Isn't NSPredicate providing what you're looking for ?

    Last time I looked, the support for regex in NSPredicate was not all
    that great.

    I'm all for the addition of a flexible and performant
    NSRegularExpression to Foundation. Anyone who agrees should file
    enhancement requests with Apple. That's you're best bet of seeing it
    happen.

    j o a r
  • On 5 Sep 2007, at 16:29, Thomas Clément wrote:

    > Isn't NSPredicate providing what you're looking for ?
    > http://developer.apple.com/documentation/Cocoa/Conceptual/
    > Predicates/Articles/pUsing.html#//apple_ref/doc/uid/TP40001794-
    > DontLinkElementID_15

    Good point, but it's still not quite the same thing as having a regex
    matcher in Foundation.  For instance, I don't think there's a way to
    get access to the results of capture groups.

    Kind regards,

    Alastair.

    --
    http://alastairs-place.net
  • On Sep 5, 2007, at 8:29 AM, Thomas Clément wrote:

    > Isn't NSPredicate providing what you're looking for ?

    An NSComparisonPredicate using NSMatchesPredicateOpertorType can only
    tell you whether a whole string matches a specified ICU regular
    expression.

    It can't get you the matched range of a substring, the number of
    subexpressions in the regular expression, or the matched ranges for
    individual subexpressions as a full regular expression class would be
    able to.

      -- Chris
  • On Sep 5, 2007, at 8:51 AM, Alexey Zakhlestin wrote:

    > On 9/3/07, Alastair Houghton <alastair...> wrote:
    >
    >> Incidentally, does PCRE have good (i.e. native) support for UTF-16?
    >
    > As far as I know, PCRE has only limited support for UTF-8, nothing as
    > feature-rich as support in ICU or Oniguruma

    Yeah - I'm definitely an Oniguruma fan. I've wanted to make a
    lightweight but useful framework for it that wasn't as bloated as
    OgreKit.

    --
    Seth Willits
  • Isn't "NS" reserved for Apple? (So much as the namespace can be
    reserved, that is.)

    --
    m-s

    On 05 Sep, 2007, at 10:08, Ilan Volow wrote:

    > If by the time that 10.5 is released, the mac development community
    > still doesn't have basic regex support in Foundation, you have my
    > vote to the change the name of your project to NSRegularExpression.
    >
    > -- Ilan
    >
    > On Sep 2, 2007, at 7:12 PM, Alastair Houghton wrote:
    >
    >> On 1 Sep 2007, at 20:49, John Engelhart wrote:
    >>
    >>> Announcing RegexKit - A framework for regular
    >>> expressions using the PCRE library.
    >>
    >> Hi John,
    >>
    >> Any chance of changing the name to PCREKit or something more
    >> specific?  It's just that there are a number of regexp libraries,
    >> all with subtly different implementations (e.g. POSIX, PCRE,
    >> Oniguruma, ICU...).  It can be a bit puzzling at times when
    >> confronted with apps using the various different libraries (and/or
    >> with various different options enabled), so I think it'd be good
    >> to make it *really* obvious to people that your library is using
    >> PCRE.
    >>
    >> Incidentally, does PCRE have good (i.e. native) support for
    >> UTF-16?  Oniguruma and ICU both do (and ICU includes a powerful
    >> implementation of the regex character class feature that lets you
    >> query Unicode attributes), which makes them a good choice for
    >> integration with Cocoa, but if you have to e.g. transcode to UTF-8
    >> in order to use regex matching, it's going to be somewhat more
    >> expensive.
    >>
    >> BTW, nice documentation.  I was going to ask what tools you used
    >> to do it, but it looks like you included them in the source
    >> distribution.  You should consider packaging up the doc. building
    >> tools separately, as it looks like they're an improvement on
    >> headerdoc.
    >>
    >> On 1 Sep 2007, at 22:20, John C. Randolph wrote:
    >>
    >
    > Ilan Volow
    > "Implicit code is inherently evil, and here's the reason why:"
  • I think this is just an expression of frustration that Apple hasn't
    provided a standard Regex framework for Cocoa, despite using ICU since
    Tiger.

    Using ICU and building against the dylib is non-trivial -- and Apple
    apparently doesn't feel addressing this issue is more important than
    animating the views as they slide around the screen.

    On 9/6/07, Michael Watson <mikey-san...> wrote:
    > Isn't "NS" reserved for Apple? (So much as the namespace can be
    > reserved, that is.)
    >
    >
    >
    > --
    > m-s
    >
    > On 05 Sep, 2007, at 10:08, Ilan Volow wrote:
    >
    >> If by the time that 10.5 is released, the mac development community
    >> still doesn't have basic regex support in Foundation, you have my
    >> vote to the change the name of your project to NSRegularExpression.
    >>
    >> -- Ilan
    >>
    >> On Sep 2, 2007, at 7:12 PM, Alastair Houghton wrote:
    >>
    >>> On 1 Sep 2007, at 20:49, John Engelhart wrote:
    >>>
    >>>> Announcing RegexKit - A framework for regular
    >>>> expressions using the PCRE library.
    >>>
    >>> Hi John,
    >>>
    >>> Any chance of changing the name to PCREKit or something more
    >>> specific?  It's just that there are a number of regexp libraries,
    >>> all with subtly different implementations (e.g. POSIX, PCRE,
    >>> Oniguruma, ICU...).  It can be a bit puzzling at times when
    >>> confronted with apps using the various different libraries (and/or
    >>> with various different options enabled), so I think it'd be good
    >>> to make it *really* obvious to people that your library is using
    >>> PCRE.
    >>>
    >>> Incidentally, does PCRE have good (i.e. native) support for
    >>> UTF-16?  Oniguruma and ICU both do (and ICU includes a powerful
    >>> implementation of the regex character class feature that lets you
    >>> query Unicode attributes), which makes them a good choice for
    >>> integration with Cocoa, but if you have to e.g. transcode to UTF-8
    >>> in order to use regex matching, it's going to be somewhat more
    >>> expensive.
    >>>
    >>> BTW, nice documentation.  I was going to ask what tools you used
    >>> to do it, but it looks like you included them in the source
    >>> distribution.  You should consider packaging up the doc. building
    >>> tools separately, as it looks like they're an improvement on
    >>> headerdoc.
    >>>
    >>> On 1 Sep 2007, at 22:20, John C. Randolph wrote:
    >>>
    >>
    >> Ilan Volow
    >> "Implicit code is inherently evil, and here's the reason why:"

    >

    --
    Mark Munz
    unmarked software
    http://www.unmarked.com/
  • On Sep 6, 2007, at 7:05 AM, Mark Munz wrote:

    > and Apple
    > apparently doesn't feel addressing this issue is more important than
    > animating the views as they slide around the screen.

    Yeah also why did they waste time adding 64b support when the could
    have been adding something a developer could have picked up
    themselves if they really needed it.

    -Shawn
  • Come on. 64b support is different than flashy animations and you know it.

    Apple has a habit of making every developer repeat the same basic work
    that should be part of the system. Even with ICU included, you still
    have to jump through a number of hoops to use it with NSStrings. They
    know it (based on a conversation I had in 2006 at WWDC).

    Rather than fix the problem once, they basically force each developer
    to "fix" the problem. And this is a pattern that is decades old.

    On 9/6/07, Shawn Erickson <shawnce...> wrote:
    >
    > On Sep 6, 2007, at 7:05 AM, Mark Munz wrote:
    >
    >> and Apple
    >> apparently doesn't feel addressing this issue is more important than
    >> animating the views as they slide around the screen.
    >
    > Yeah also why did they waste time adding 64b support when the could
    > have been adding something a developer could have picked up
    > themselves if they really needed it.
    >
    > -Shawn
    >

    --
    Mark Munz
    unmarked software
    http://www.unmarked.com/
  • We definitely want this to happen, but it's a lot of work. This is
    the current roadblock:

    http://bugs.icu-project.org/trac/ticket/4521

    Please note the number of weeks estimated for implementation.

    Sorry for the long wait...

    Deborah Goldsmith
    Internationalization, Unicode Liaison
    Apple Inc.
    <goldsmit...>

    On Sep 6, 2007, at 7:32 AM, Mark Munz wrote:

    > Come on. 64b support is different than flashy animations and you
    > know it.
    >
    > Apple has a habit of making every developer repeat the same basic work
    > that should be part of the system. Even with ICU included, you still
    > have to jump through a number of hoops to use it with NSStrings. They
    > know it (based on a conversation I had in 2006 at WWDC).
    >
    > Rather than fix the problem once, they basically force each developer
    > to "fix" the problem. And this is a pattern that is decades old.
    >
    > On 9/6/07, Shawn Erickson <shawnce...> wrote:
    >>
    >> On Sep 6, 2007, at 7:05 AM, Mark Munz wrote:
    >>
    >>> and Apple
    >>> apparently doesn't feel addressing this issue is more important than
    >>> animating the views as they slide around the screen.
    >>
    >> Yeah also why did they waste time adding 64b support when the could
    >> have been adding something a developer could have picked up
    >> themselves if they really needed it.
    >>
    >> -Shawn
    >>
    >
    >
    > --
    > Mark Munz
    > unmarked software
    > http://www.unmarked.com/
  • Hmm. Well, that's a reasonable obstacle. But still, it seems as if
    Apple could have implemented it the "slow" way for now (where it
    converts each string to a buffer before regex'ing it), and then once
    this bug gets implemented, convert the OS to use the "fast" way. That
    way the API could internally be improved over time while the
    interface stays the same.

    Or is there another reason why it couldn't be done this way...?

    Also, FWIW, there are plenty of internal Apple projects which already
    mix regular expressions and NSStrings—Xcode comes to mind—so again,
    it doesn't seem like a total blocker. Just maybe a short-term perf
    issue.

    On Sep 10, 2007, at 7:09 PM, Deborah Goldsmith wrote:

    > We definitely want this to happen, but it's a lot of work. This is
    > the current roadblock:
    >
    > http://bugs.icu-project.org/trac/ticket/4521
    >
    > Please note the number of weeks estimated for implementation.
    >
    > Sorry for the long wait...
    >
    > Deborah Goldsmith
    > Internationalization, Unicode Liaison
    > Apple Inc.
    > <goldsmit...>
    >
    > On Sep 6, 2007, at 7:32 AM, Mark Munz wrote:
    >
    >> Come on. 64b support is different than flashy animations and you
    >> know it.
    >>
    >> Apple has a habit of making every developer repeat the same basic
    >> work
    >> that should be part of the system. Even with ICU included, you still
    >> have to jump through a number of hoops to use it with NSStrings. They
    >> know it (based on a conversation I had in 2006 at WWDC).
    >>
    >> Rather than fix the problem once, they basically force each developer
    >> to "fix" the problem. And this is a pattern that is decades old.
    >>
    >> On 9/6/07, Shawn Erickson <shawnce...> wrote:
    >>>
    >>> On Sep 6, 2007, at 7:05 AM, Mark Munz wrote:
    >>>
    >>>> and Apple
    >>>> apparently doesn't feel addressing this issue is more important
    >>>> than
    >>>> animating the views as they slide around the screen.
    >>>
    >>> Yeah also why did they waste time adding 64b support when the could
    >>> have been adding something a developer could have picked up
    >>> themselves if they really needed it.
    >>>
    >>> -Shawn
    >>>
    >>
    >>
    >> --
    >> Mark Munz
    >> unmarked software
    >> http://www.unmarked.com/

  • > We definitely want this to happen, but it's a lot of work. This is
    > the current roadblock:
    >
    > http://bugs.icu-project.org/trac/ticket/4521

    While it may be a lot of work, but it also looks like the issue has
    pushed off again and again as the bug has been open for two years.

    It definitely seems like great is the enemy of good here.

    Also, isn't Xcode already using ICU for its regular expression searches?

    --
    Mark Munz
    unmarked software
    http://www.unmarked.com/
  • On Sep 6, 2007, at 7:05 AM, Mark Munz wrote:

    > Using ICU and building against the dylib is non-trivial -- and Apple
    > apparently doesn't feel addressing this issue is more important than
    > animating the views as they slide around the screen.

    Think for a moment:  would you expect that the people who are best
    able to implement an NSRegex class are the same ones who implemented
    CoreAnimation?

    -jcr
  • On 11 Sep 2007, at 03:46, John Stiles wrote:

    > Hmm. Well, that's a reasonable obstacle. But still, it seems as if
    > Apple could have implemented it the "slow" way for now (where it
    > converts each string to a buffer before regex'ing it),

    Actually you don't need to do that in all cases.  You can use
    CFStringGetCharactersPtr() to try to get a UTF-16 character pointer
    for the string, and then fall back to allocating a buffer if/when
    that fails.

    Unfortunately I have a suspicion that the function always fails for
    string constants...

    On 11 Sep 2007, at 04:16, Mark Munz wrote:

    >> We definitely want this to happen, but it's a lot of work. This is
    >> the current roadblock:
    >>
    >> http://bugs.icu-project.org/trac/ticket/4521
    >
    > While it may be a lot of work, but it also looks like the issue has
    > pushed off again and again as the bug has been open for two years.

    Since anyone is free to contribute to ICU, if you particularly care
    about it, you could offer your assistance to fix it.  Indeed, any of
    us could have any time over the past two years.  It looks from the
    ticket though that IBM, Apple and Google all have people working on
    the problem right now---at least, I'm guessing that that's what the
    "load" property means.

    > It definitely seems like great is the enemy of good here.

    I think that viewpoint is hard to defend.

    It's always been possible for developers to use regexps in their own
    code, whether by using the POSIX functions, or by linking in a third
    party library (Oniguruma probably being the best match for Cocoa
    strings right now as it has native UTF-16 support).  Is it fair,
    therefore, to criticise Apple for electing to wait until the ICU
    implementation had all the features they felt they needed for really
    great regexp support?  I don't think so.  It isn't exactly a huge
    effort using a third-party library for now.

    Put another way, since we have plenty of options right now, why rush
    to make an official "NSRegularExpression" (or CFRegularExpression...)
    before there's a significant advantage in doing so?

    I would also guess that the regexp issue has had more bugs filed
    against it in recent months because it seems that more people are
    coming from scripting languages where regexps are widely available,
    as opposed to C where they're really only available on Unix-like
    systems (or with a third-party library...).

    Kind regards,

    Alastair.

    --
    http://alastairs-place.net
  • If I ignore the "Beware of Leopard" of signs for just a moment, can
    we reasonably expect NSRegularExpression (or something equivalent) in
    10.5?

    -- Ilan

    On Sep 10, 2007, at 10:09 PM, Deborah Goldsmith wrote:

    > We definitely want this to happen, but it's a lot of work. This is
    > the current roadblock:
    >
    > http://bugs.icu-project.org/trac/ticket/4521
    >
    > Please note the number of weeks estimated for implementation.
    >
    > Sorry for the long wait...
    >
    > Deborah Goldsmith
    > Internationalization, Unicode Liaison
    > Apple Inc.
    > <goldsmit...>
    >
    > On Sep 6, 2007, at 7:32 AM, Mark Munz wrote:
    >
    >> Come on. 64b support is different than flashy animations and you
    >> know it.
    >>
    >> Apple has a habit of making every developer repeat the same basic
    >> work
    >> that should be part of the system. Even with ICU included, you still
    >> have to jump through a number of hoops to use it with NSStrings. They
    >> know it (based on a conversation I had in 2006 at WWDC).
    >>
    >> Rather than fix the problem once, they basically force each developer
    >> to "fix" the problem. And this is a pattern that is decades old.
    >>
    >> On 9/6/07, Shawn Erickson <shawnce...> wrote:
    >>>
    >>> On Sep 6, 2007, at 7:05 AM, Mark Munz wrote:
    >>>
    >>>> and Apple
    >>>> apparently doesn't feel addressing this issue is more important
    >>>> than
    >>>> animating the views as they slide around the screen.
    >>>
    >>> Yeah also why did they waste time adding 64b support when the could
    >>> have been adding something a developer could have picked up
    >>> themselves if they really needed it.
    >>>
    >>> -Shawn
    >>>
    >>
    >>
    >> --
    >> Mark Munz
    >> unmarked software
    >> http://www.unmarked.com/


    Ilan Volow
    "Implicit code is inherently evil, and here's the reason why:"
  • > Think for a moment:  would you expect that the people who are best
    > able to implement an NSRegex class are the same ones who implemented
    > CoreAnimation?

    The point of the comment was about priority, not about the specific
    engineers working on it. This stuff is already implemented in existing
    Apple code (Xcode).

    --
    Mark Munz
    unmarked software
    http://www.unmarked.com/
  • >> It definitely seems like great is the enemy of good here.
    >
    > I think that viewpoint is hard to defend.
    >
    > It's always been possible for developers to use regexps in their own
    > code, whether by using the POSIX functions, or by linking in a third
    > party library (Oniguruma probably being the best match for Cocoa
    > strings right now as it has native UTF-16 support).  Is it fair,
    > therefore, to criticise Apple for electing to wait until the ICU
    > implementation had all the features they felt they needed for really
    > great regexp support?  I don't think so.  It isn't exactly a huge
    > effort using a third-party library for now.

    I think you just made my argument for me. Apple is waiting for all the
    features they felt they needed for really great regexp support. (the
    bug description seemed a bit vague on what it was they were waiting
    for, more like one of those high-level "we need some APIs" bugs)

    > Put another way, since we have plenty of options right now, why rush
    > to make an official "NSRegularExpression" (or CFRegularExpression...)
    > before there's a significant advantage in doing so?

    That argument could be made for virtually anything Apple does or
    doesn't do. There are existing options for animation, advanced
    controls, data management. Those can all be done right now, but the
    new APIs make the whole task easier. That's all I'm asking for, make
    the task of using regexp easier (especially given how commonplace it
    now is in everything but Cocoa).

    I'm just beginning to look at CocoaICU (first heard of it in this
    thread), but if one guy can do that -- certainly Apple could do
    something similar.

    When I spoke to some folks at Apple at WWDC 2006, I was told that ICU
    was the future direction. That's great! I just would like to see that
    future before 2010! Unfortunately, I think that is exactly what we'll
    have to wait for.

    (fyi -- 2010 is just a guess as to when 10.6 might show up)

    > I would also guess that the regexp issue has had more bugs filed
    > against it in recent months because it seems that more people are
    > coming from scripting languages where regexps are widely available,
    > as opposed to C where they're really only available on Unix-like
    > systems (or with a third-party library...).

    Very true. In fact, I think Obj-c/c++ developers are the only group
    with a high level framework that doesn't have built-in support for
    regexp.

    --
    Mark Munz
    unmarked software
    http://www.unmarked.com/
  • On Sep 11, 2007, at 9:39 AM, Mark Munz wrote:

    > Very true. In fact, I think Obj-c/c++ developers are the only group
    > with a high level framework that doesn't have built-in support for
    > regexp.

    Actually, I am pretty sure that TR1 includes regular expressions.
  • On 11 Sep 2007, at 17:39, Mark Munz wrote:

    > I think you just made my argument for me. Apple is waiting for all the
    > features they felt they needed for really great regexp support. (the
    > bug description seemed a bit vague on what it was they were waiting
    > for, more like one of those high-level "we need some APIs" bugs)

    I believe the bug Deborah pointed at is basically talking about
    changing the ICU regexp engine to be character-encoding agnostic (so
    you can compile a regexp once and then use it to match against
    strings in UTF-8, UTF-16 or perhaps text in other coding systems) and
    to support matching against buffers that are spread across multiple
    memory regions.  (Currently it only supports native-endian UTF-16,
    and to do a really good job of regexp integration with CF/NSString,
    you'd want to be able to support 8-bit encodings and ideally---for
    NSTextStorage---multi-chunk storage as well.)

    I could be wrong, but I think that's what it's all about.

    >> Put another way, since we have plenty of options right now, why rush
    >> to make an official "NSRegularExpression" (or CFRegularExpression...)
    >> before there's a significant advantage in doing so?
    >
    > That argument could be made for virtually anything Apple does or
    > doesn't do. There are existing options for animation, advanced
    > controls, data management. Those can all be done right now, but the
    > new APIs make the whole task easier.

    Well... there are plenty of things that *only* Apple can do, and even
    more that only Apple can do without risking the use of unsupported APIs.

    > That's all I'm asking for, make the task of using regexp easier
    > (especially given how commonplace it now is in everything but Cocoa).

    Sure, though it isn't exactly hard right now.  It just requires a
    little bit of effort on the part of developers, and there are several
    Cocoa regexp frameworks available (including RegexKit, which I still
    wish had a different name :-)).

    > I'm just beginning to look at CocoaICU (first heard of it in this
    > thread), but if one guy can do that -- certainly Apple could do
    > something similar.

    There's OGREKit as well:

      http://www8.ocn.ne.jp/~sonoisa/OgreKit/

    Personally, where I've needed regexps in Cocoa apps, I've just used
    Oniguruma's C API together with a little bit of CoreFoundation; it's
    quite easy, which is why I'm not hugely concerned about having this
    feature as part of Cocoa just yet (or rather, why I'd rather Apple
    did the changes mentioned above before adding it... that way there
    will be really obvious advantages to using Apple's NSRegularExpression).

    I do understand where those people asking for this feature are coming
    from, but I also think it would be a mistake to rush a solution and
    then have to deprecate APIs almost immediately.

    Kind regards,

    Alastair.

    --
    http://alastairs-place.net
  • Well, I think the point of APIs is kind of the opposite of what you're
    talking about, though. You can invent an API and ship it, even if
    version 1.0 is maybe not perfect from a performance perspective, and
    then in version 2.0, you can keep the same interface to the outside
    world but improve the guts so that they run faster or use less memory or
    whatever.

    Alastair Houghton wrote:

    > On 11 Sep 2007, at 17:39, Mark Munz wrote:
    >
    >> I think you just made my argument for me. Apple is waiting for all the
    >> features they felt they needed for really great regexp support. (the
    >> bug description seemed a bit vague on what it was they were waiting
    >> for, more like one of those high-level "we need some APIs" bugs)
    >
    >
    > I believe the bug Deborah pointed at is basically talking about
    > changing the ICU regexp engine to be character-encoding agnostic (so
    > you can compile a regexp once and then use it to match against
    > strings in UTF-8, UTF-16 or perhaps text in other coding systems) and
    > to support matching against buffers that are spread across multiple
    > memory regions.  (Currently it only supports native-endian UTF-16,
    > and to do a really good job of regexp integration with CF/NSString,
    > you'd want to be able to support 8-bit encodings and ideally---for
    > NSTextStorage---multi-chunk storage as well.)
    >
    > I could be wrong, but I think that's what it's all about.
    >
    >>> Put another way, since we have plenty of options right now, why rush
    >>> to make an official "NSRegularExpression" (or CFRegularExpression...)
    >>> before there's a significant advantage in doing so?
    >>
    >>
    >> That argument could be made for virtually anything Apple does or
    >> doesn't do. There are existing options for animation, advanced
    >> controls, data management. Those can all be done right now, but the
    >> new APIs make the whole task easier.
    >
    >
    > Well... there are plenty of things that *only* Apple can do, and even
    > more that only Apple can do without risking the use of unsupported APIs.
    >
    >> That's all I'm asking for, make the task of using regexp easier
    >> (especially given how commonplace it now is in everything but Cocoa).
    >
    >
    > Sure, though it isn't exactly hard right now.  It just requires a
    > little bit of effort on the part of developers, and there are several
    > Cocoa regexp frameworks available (including RegexKit, which I still
    > wish had a different name :-)).
    >
    >> I'm just beginning to look at CocoaICU (first heard of it in this
    >> thread), but if one guy can do that -- certainly Apple could do
    >> something similar.
    >
    >
    > There's OGREKit as well:
    >
    > http://www8.ocn.ne.jp/~sonoisa/OgreKit/
    >
    > Personally, where I've needed regexps in Cocoa apps, I've just used
    > Oniguruma's C API together with a little bit of CoreFoundation; it's
    > quite easy, which is why I'm not hugely concerned about having this
    > feature as part of Cocoa just yet (or rather, why I'd rather Apple
    > did the changes mentioned above before adding it... that way there
    > will be really obvious advantages to using Apple's NSRegularExpression).
    >
    > I do understand where those people asking for this feature are coming
    > from, but I also think it would be a mistake to rush a solution and
    > then have to deprecate APIs almost immediately.
    >
    > Kind regards,
    >
    > Alastair.
    >
    > --
    > http://alastairs-place.net
    >
    >
  • On 11 Sep 2007, at 18:05, John Stiles wrote:

    > Well, I think the point of APIs is kind of the opposite of what
    > you're talking about, though. You can invent an API and ship it,
    > even if version 1.0 is maybe not perfect from a performance
    > perspective, and then in version 2.0, you can keep the same
    > interface to the outside world but improve the guts so that they
    > run faster or use less memory or whatever.

    That's all true, but why release an implementation at all if there
    are third-party alternatives that are just as simple to use and if
    your implementation has no performance or other advantages?
    Particularly if there's a risk (and I think there is) that the API
    you come up with might have to change because of major architectural
    work going on on the underlying ICU code.

    In such a situation, I'd rather they didn't rush.  Apparently Mark
    Munz feels differently, but I don't think we're going to achieve
    anything useful by debating the matter further.

    Right now there isn't an NSRegularExpression, and it looks like
    there's more work necessary before we get one.  So we can use one of
    various third-party frameworks, or directly use one of various third-
    party libraries.

    Kind regards,

    Alastair.

    --
    http://alastairs-place.net
  • The reason to ship it as an official API is that if you bundle in some
    third-party solution, it won't improve with time. If Apple does it, they
    can improve it in Leopard or 10.6, and all apps that adopted the
    official API suddenly get the benefits.

    Anyway, bundling in third-party libraries has plenty of downsides. If
    ten apps do it, you've got the same chunk of code loaded ten times in
    RAM instead of once. It takes more disk space and slows down app loading
    times as well. Not all developers will keep their apps up-to-date with
    the latest version of the third-party library, so security
    vulnerabilities and perf issues from older versions of the third-party
    code will always be an issue. All minor concerns when expressed
    individually, sure, but these are the reasons we have official frameworks.

    Alastair Houghton wrote:

    > On 11 Sep 2007, at 18:05, John Stiles wrote:
    >
    >> Well, I think the point of APIs is kind of the opposite of what
    >> you're talking about, though. You can invent an API and ship it,
    >> even if version 1.0 is maybe not perfect from a performance
    >> perspective, and then in version 2.0, you can keep the same
    >> interface to the outside world but improve the guts so that they  run
    >> faster or use less memory or whatever.
    >
    >
    > That's all true, but why release an implementation at all if there
    > are third-party alternatives that are just as simple to use and if
    > your implementation has no performance or other advantages?
    > Particularly if there's a risk (and I think there is) that the API
    > you come up with might have to change because of major architectural
    > work going on on the underlying ICU code.
    >
    > In such a situation, I'd rather they didn't rush.  Apparently Mark
    > Munz feels differently, but I don't think we're going to achieve
    > anything useful by debating the matter further.
    >
    > Right now there isn't an NSRegularExpression, and it looks like
    > there's more work necessary before we get one.  So we can use one of
    > various third-party frameworks, or directly use one of various third-
    > party libraries.
    >
    > Kind regards,
    >
    > Alastair.
    >
    > --
    > http://alastairs-place.net
    >
    >
  • On 9/11/07, Alastair Houghton <alastair...> wrote:

    > That's all true, but why release an implementation at all if there
    > are third-party alternatives that are just as simple to use and if
    > your implementation has no performance or other advantages?

    There are some 3rd party options, but they come with their own set of
    issues. Just because I want to use regexps does not mean I want to be
    forced to become an expert in the underlying engine code.

    > Particularly if there's a risk (and I think there is) that the API
    > you come up with might have to change because of major architectural
    > work going on on the underlying ICU code.

    Are you just assuming this might be the case or are there actual major
    architectural changes taking place in the underlying ICU code with the
    next release of the library?

    > In such a situation, I'd rather they didn't rush.  Apparently Mark
    > Munz feels differently, but I don't think we're going to achieve
    > anything useful by debating the matter further.

    Rushing? Mind you, I'm pretty sure it's not going to be in Leopard --
    so we're already looking at Leopard + 1, which means at least 2010,
    maybe 2011 before support *might* show up. The issue is *at least* 2
    years old. And we're assuming it doesn't get pushed back yet again (as
    it has for the last 2 years).

    > Right now there isn't an NSRegularExpression, and it looks like
    > there's more work necessary before we get one.  So we can use one of
    > various third-party frameworks, or directly use one of various third-
    > party libraries.

    Yes, the same argument can be made for features like Source lists and
    Media Browsing. Just because there is a work around doesn't mean there
    shouldn't be a solution included in the current Cocoa frameworks.

    While there are 3rd party options, they aren't all consistent in their
    syntax and options. So any regexp exposed to the end user is subject
    to inconsistent behavior from their point of view. Having a common
    library supported by the OS helps eliminate that issue.

    --
    Mark Munz
    unmarked software
    http://www.unmarked.com/
  • Although this subject may be very interesting to some readers, with well
    over 4000 subscribers to cocoa-dev, it's important we keep the subjects on
    topic.

    I encourage you to file an enhancement request through
    bugreporter.apple.com, or please take the discussion off-line.

    Thanks,
    Cocoa-Dev Admins

    > On 9/11/07, Alastair Houghton <alastair...> wrote:
    >
    >> That's all true, but why release an implementation at all if there
    >> are third-party alternatives that are just as simple to use and if
    >> your implementation has no performance or other advantages?
    >
    > There are some 3rd party options, but they come with their own set of
    > issues. Just because I want to use regexps does not mean I want to be
    > forced to become an expert in the underlying engine code.
    >
    >> Particularly if there's a risk (and I think there is) that the API
    >> you come up with might have to change because of major architectural
    >> work going on on the underlying ICU code.
    >
    > Are you just assuming this might be the case or are there actual major
    > architectural changes taking place in the underlying ICU code with the
    > next release of the library?
    >
    >> In such a situation, I'd rather they didn't rush.  Apparently Mark
    >> Munz feels differently, but I don't think we're going to achieve
    >> anything useful by debating the matter further.
    >
    > Rushing? Mind you, I'm pretty sure it's not going to be in Leopard --
    > so we're already looking at Leopard + 1, which means at least 2010,
    > maybe 2011 before support *might* show up. The issue is *at least* 2
    > years old. And we're assuming it doesn't get pushed back yet again (as
    > it has for the last 2 years).
    >
    >> Right now there isn't an NSRegularExpression, and it looks like
    >> there's more work necessary before we get one.  So we can use one of
    >> various third-party frameworks, or directly use one of various third-
    >> party libraries.
    >
    > Yes, the same argument can be made for features like Source lists and
    > Media Browsing. Just because there is a work around doesn't mean there
    > shouldn't be a solution included in the current Cocoa frameworks.
    >
    > While there are 3rd party options, they aren't all consistent in their
    > syntax and options. So any regexp exposed to the end user is subject
    > to inconsistent behavior from their point of view. Having a common
    > library supported by the OS helps eliminate that issue.
  • Although this subject may be very interesting to some readers, with well
    over 4000 subscribers to cocoa-dev, it's important we keep the subjects on
    topic.

    I encourage you to file an enhancement request through
    bugreporter.apple.com, or please take the discussion off-line.

    Thanks,
    Cocoa-Dev Admins

    > On 9/11/07, Alastair Houghton <alastair...> wrote:
    >
    >> That's all true, but why release an implementation at all if there
    >> are third-party alternatives that are just as simple to use and if
    >> your implementation has no performance or other advantages?
    >
    > There are some 3rd party options, but they come with their own set of
    > issues. Just because I want to use regexps does not mean I want to be
    > forced to become an expert in the underlying engine code.
    >
    >> Particularly if there's a risk (and I think there is) that the API
    >> you come up with might have to change because of major architectural
    >> work going on on the underlying ICU code.
    >
    > Are you just assuming this might be the case or are there actual major
    > architectural changes taking place in the underlying ICU code with the
    > next release of the library?
    >
    >> In such a situation, I'd rather they didn't rush.  Apparently Mark
    >> Munz feels differently, but I don't think we're going to achieve
    >> anything useful by debating the matter further.
    >
    > Rushing? Mind you, I'm pretty sure it's not going to be in Leopard --
    > so we're already looking at Leopard + 1, which means at least 2010,
    > maybe 2011 before support *might* show up. The issue is *at least* 2
    > years old. And we're assuming it doesn't get pushed back yet again (as
    > it has for the last 2 years).
    >
    >> Right now there isn't an NSRegularExpression, and it looks like
    >> there's more work necessary before we get one.  So we can use one of
    >> various third-party frameworks, or directly use one of various third-
    >> party libraries.
    >
    > Yes, the same argument can be made for features like Source lists and
    > Media Browsing. Just because there is a work around doesn't mean there
    > shouldn't be a solution included in the current Cocoa frameworks.
    >
    > While there are 3rd party options, they aren't all consistent in their
    > syntax and options. So any regexp exposed to the end user is subject
    > to inconsistent behavior from their point of view. Having a common
    > library supported by the OS helps eliminate that issue.
  • On 11.09.2007, at 18:42, John Stiles wrote:

    > On Sep 11, 2007, at 9:39 AM, Mark Munz wrote:
    >
    >> Very true. In fact, I think Obj-c/c++ developers are the only group
    >> with a high level framework that doesn't have built-in support for
    >> regexp.
    >
    > Actually, I am pretty sure that TR1 includes regular expressions.
    > _______________________________________________

    They are. But not implemented as of the current  g++ std:: library.

    <http://gcc.gnu.org/onlinedocs/libstdc++/ext/tr1.html>
previous month september 2007 next month
MTWTFSS
          1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
Go to today