Regex

  • Hello I have been trying to find a good Regex framework for cocoa.
    I am trying to find urls in an html page, I have this regex from php
    that I made so all I would need is a way to bring it to cocoa, the
    regex is /<a\s[^>]*href=(\"??)([^\" >]*?)\\1[^>]*>.*<\/a>/siU

    Thanks for the help.
    Mr. Gecko
  • http://regexkit.sourceforge.net

    I use it pretty frequently (the Lite version, anyway).

    HTH,

    Dave

    On 17 Nov, 2008, at 6:04 PM, Mr. Gecko wrote:

    > Hello I have been trying to find a good Regex framework for cocoa.
    > I am trying to find urls in an html page, I have this regex from php
    > that I made so all I would need is a way to bring it to cocoa, the
    > regex is /<a\s[^>]*href=(\"??)([^\" >]*?)\\1[^>]*>.*<\/a>/siU
    >
    > Thanks for the help.
    > Mr. Gecko
  • On 17 Nov 08, at 17:04, Mr. Gecko wrote:
    > Hello I have been trying to find a good Regex framework for cocoa.
    > I am trying to find urls in an html page...

    Assuming that you're loading the web page into a WebView or similar,
    you'll have a much easier time doing this through the HTML DOM. Trying
    to parse HTML with regular expressions is risky, as there are numerous
    edge cases which are easy to miss.
  • I've never thought of that, but I am using NSURL because I want it to
    be a crawler.
    I'll see if I can do that with what I got so far.

    On Nov 17, 2008, at 7:43 PM, Andrew Farmer wrote:

    > On 17 Nov 08, at 17:04, Mr. Gecko wrote:
    >> Hello I have been trying to find a good Regex framework for cocoa.
    >> I am trying to find urls in an html page...
    >
    > Assuming that you're loading the web page into a WebView or similar,
    > you'll have a much easier time doing this through the HTML DOM.
    > Trying to parse HTML with regular expressions is risky, as there are
    > numerous edge cases which are easy to miss.
  • > Hello I have been trying to find a good Regex framework for cocoa.
    > I am trying to find urls in an html page, I have this regex from php
    > that I made so all I would need is a way to bring it to cocoa, the
    > regex is /<a\s[^> ]*href=(\"??)([^\" > ]*?)\\1[^> ]*> .*<\/a> /siU

    <http://www.google.com/search?client=safari&rls=en-au&q=regexp+cocoa
    &ie=UTF-8&oe=UTF-8
    >

    gives

    <http://www.cocoadev.com/index.pl?RegularExpressions>

    Which lists lots of information.

    I've used RegexKitLite which works well on Mac OS X.  RegexKit
    appears to have forked in to two variants:

    The original RegexKit which does not seem to be getting further
    development (since the start of 2008) which uses PCRE 7.6

    And the newer RegexKitLite which appears to be getting the bulk of
    development now, which uses the ICU library which is shipped with Mac
    OS X (but not public on iPhone).

    Enjoy,
        Peter.

    --
                  Keyboard Maestro 3 Now Available!
                    Now With Status Menu triggers!

    Keyboard Maestro <http://www.keyboardmaestro.com/> Macros for your Mac
    <>          <<A href="http://download.stairways.com/">http://download.stairways.com/>
  • I've found RegexKit but I couldn't figure out how to get an array from
    my string.

    On Nov 17, 2008, at 7:43 PM, Andrew Farmer wrote:

    > On 17 Nov 08, at 17:04, Mr. Gecko wrote:
    >> Hello I have been trying to find a good Regex framework for cocoa.
    >> I am trying to find urls in an html page...
    >
    > Assuming that you're loading the web page into a WebView or similar,
    > you'll have a much easier time doing this through the HTML DOM.
    > Trying to parse HTML with regular expressions is risky, as there are
    > numerous edge cases which are easy to miss.
  • I never was able to compile RegexKitLite for some reason, and when I
    use the framework it says warning: 'NSString' may not respond to '-
    arrayByMatchingObjectsWithRegex:' and when I run the code it gives me
    this in the debug
    *** -[NSCFString arrayByMatchingObjectsWithRegex:]: unrecognized
    selector sent to instance 0x872c00
    Any help?

    Thanks,
    Mr. Gecko

    On Nov 17, 2008, at 10:09 PM, <cocoa-dev-request...> wrote:

    > >
    >
    > gives
    >
    > <http://www.cocoadev.com/index.pl?RegularExpressions>
    >
    > Which lists lots of information.
    >
    > I've used RegexKitLite which works well on Mac OS X.  RegexKit
    > appears to have forked in to two variants:
    >
    > The original RegexKit which does not seem to be getting further
    > development (since the start of 2008) which uses PCRE 7.6
    >
    > And the newer RegexKitLite which appears to be getting the bulk of
    > development now, which uses the ICU library which is shipped with
    > Mac OS X (but not public on iPhone).
    >
    > Enjoy,
    > Peter.
  • To get it to compile you need to do two things:
    1.  Add the "Other Linker Flag" "-licucore" to your project build
    settings
    2.  Import the RKL header into whatever files you'll use it in
    (alternatively, you can import it into your .pch file so that it will
    get included into everything automatically)

    HTH,

    Dave

    On 17 Nov, 2008, at 10:33 PM, Mr. Gecko wrote:

    > I never was able to compile RegexKitLite for some reason, and when I
    > use the framework it says warning: 'NSString' may not respond to '-
    > arrayByMatchingObjectsWithRegex:' and when I run the code it gives
    > me this in the debug
    > *** -[NSCFString arrayByMatchingObjectsWithRegex:]: unrecognized
    > selector sent to instance 0x872c00
    > Any help?
    >
    > Thanks,
    > Mr. Gecko
  • Here is what I am trying now.

    NSString *recived = [[NSString alloc] initWithData:receivedData
    encoding:NSUTF8StringEncoding];
    RKRegex *regex = [RKRegex regexWithRegexString:@"/<a\\s[^>]*href=(\"??)
    ([^\" >]*?)\\1[^>]*>.*<\\/a>/siU" options:RKCompileNoOptions];
    RKEnumerator *recivedE = [recived matchEnumeratorWithRegex:regex];
    while([recivedE nextRanges] != NULL) {
    NSRange matchRange = [recivedE currentRange];
    NSString *link = [recived substringWithRange:matchRange];
    NSLog(@"%@", link);
    }

    It doesn't work. I get this in the debug terminal.
    CFPropertyListCreateFromXMLData(): Old-style plist parser: missing
    semicolon in dictionary.
    0xffe48 [RKRegex regexWithRegexString:options:]: (formatString is NULL)

    I don't know what can be happening or why they don't have example code.

    Thanks for any help,
    Mr. Gecko

    On Nov 17, 2008, at 11:33 PM, Mr. Gecko wrote:

    > I never was able to compile RegexKitLite for some reason, and when I
    > use the framework it says warning: 'NSString' may not respond to '-
    > arrayByMatchingObjectsWithRegex:' and when I run the code it gives
    > me this in the debug
    > *** -[NSCFString arrayByMatchingObjectsWithRegex:]: unrecognized
    > selector sent to instance 0x872c00
    > Any help?
    >
    > Thanks,
    > Mr. Gecko
    >
    > On Nov 17, 2008, at 10:09 PM, <cocoa-dev-request...> wrote:
    >
    >> <http://www.google.com/search?client=safari&rls=en-au&q=regexp+cocoa
    &ie=UTF-8&oe=UTF-8
    >> >
    >>
    >> gives
    >>
    >> <http://www.cocoadev.com/index.pl?RegularExpressions>
    >>
    >> Which lists lots of information.
    >>
    >> I've used RegexKitLite which works well on Mac OS X.  RegexKit
    >> appears to have forked in to two variants:
    >>
    >> The original RegexKit which does not seem to be getting further
    >> development (since the start of 2008) which uses PCRE 7.6
    >>
    >> And the newer RegexKitLite which appears to be getting the bulk of
    >> development now, which uses the ICU library which is shipped with
    >> Mac OS X (but not public on iPhone).
    >>
    >> Enjoy,
    >> Peter.
    >
  • so I tried that and it worked but it doesn't seem to do what I need.
    I am needing to do the same thing as preg_match_all in php so I can
    find all links and have it in an NSArray to go through and add to a
    database. Any ideas on how I can do that?

    On Nov 18, 2008, at 11:59 AM, <cocoa-dev-request...> wrote:

    > To get it to compile you need to do two things:
    > 1.  Add the "Other Linker Flag" "-licucore" to your project build
    > settings
    > 2.  Import the RKL header into whatever files you'll use it in
    > (alternatively, you can import it into your .pch file so that it
    > will get included into everything automatically)
    >
    > HTH,
    >
    > Dave
  • I should've clarified that those two things are to get the
    RegexKitLite additions working.  When I needed the enumeration, I just
    followed the steps on the docs page to "Creating a Match Enumerator".
    It basically has you copy and paste some stuff into new files, since
    the RKL didn't include it by default.  That has worked for me.

    Alternatively, you could use the RKL addition to NSString to create an
    NSArray using "componentsSeparatedByRegex", and then enumerate over
    the array (you'd even get fast enumeration if you're using Leopard).
    That might give you a different approach to parsing out links.

    Dave

    On Nov 18, 2008, at 11:35 AM, Mr. Gecko wrote:

    > so I tried that and it worked but it doesn't seem to do what I need.
    > I am needing to do the same thing as preg_match_all in php so I can
    > find all links and have it in an NSArray to go through and add to a
    > database. Any ideas on how I can do that?
    >
    > On Nov 18, 2008, at 11:59 AM, <cocoa-dev-request...> wrote:
    >
    >> To get it to compile you need to do two things:
    >> 1.  Add the "Other Linker Flag" "-licucore" to your project build
    >> settings
    >> 2.  Import the RKL header into whatever files you'll use it in
    >> (alternatively, you can import it into your .pch file so that it
    >> will get included into everything automatically)
    >>
    >> HTH,
    >>
    >> Dave
    >
  • NSPredicate can handle ICU standard regex matches.
    More info in the NSPredicate docs.
    The following snippet validates a UUID.

    /*

      is UUID

      see http://www.stiefels.net/2007/01/24/regular-expressions-for-nsstring/

      */
    - (BOOL)isUUID
    {
    NSString *regex = @"^(([0-9a-fA-F]){8}-([0-9a-fA-F]){4}-([0-9a-fA-F])
    {4}-([0-9a-fA-F]){4}-([0-9a-fA-F]){12})$";

    // supported non standard regex format is at http://www.icu-project.org/userguide/regexp.html
    NSPredicate *regextest = [NSPredicate predicateWithFormat:@"SELF
    MATCHES %@", regex];
    return [regextest evaluateWithObject:self];
    }

    Jonathan Mitchell

    Central Conscious Unit
    http://www.mugginsoft.com