Regex
-
Hello I have been trying to find a good Regex framework for cocoa.
I am trying to find urls in an html page, I have this regex from php
that I made so all I would need is a way to bring it to cocoa, the
regex is /<a\s[^>]*href=(\"??)([^\" >]*?)\\1[^>]*>.*<\/a>/siU
Thanks for the help.
Mr. Gecko -
http://regexkit.sourceforge.net
I use it pretty frequently (the Lite version, anyway).
HTH,
Dave
On 17 Nov, 2008, at 6:04 PM, Mr. Gecko wrote:> Hello I have been trying to find a good Regex framework for cocoa.
> I am trying to find urls in an html page, I have this regex from php
> that I made so all I would need is a way to bring it to cocoa, the
> regex is /<a\s[^>]*href=(\"??)([^\" >]*?)\\1[^>]*>.*<\/a>/siU
>
> Thanks for the help.
> Mr. Gecko -
On 17 Nov 08, at 17:04, Mr. Gecko wrote:> Hello I have been trying to find a good Regex framework for cocoa.
> I am trying to find urls in an html page...
Assuming that you're loading the web page into a WebView or similar,
you'll have a much easier time doing this through the HTML DOM. Trying
to parse HTML with regular expressions is risky, as there are numerous
edge cases which are easy to miss. -
I've never thought of that, but I am using NSURL because I want it to
be a crawler.
I'll see if I can do that with what I got so far.
On Nov 17, 2008, at 7:43 PM, Andrew Farmer wrote:> On 17 Nov 08, at 17:04, Mr. Gecko wrote:
>> Hello I have been trying to find a good Regex framework for cocoa.
>> I am trying to find urls in an html page...
>
> Assuming that you're loading the web page into a WebView or similar,
> you'll have a much easier time doing this through the HTML DOM.
> Trying to parse HTML with regular expressions is risky, as there are
> numerous edge cases which are easy to miss. -
> Hello I have been trying to find a good Regex framework for cocoa.
> I am trying to find urls in an html page, I have this regex from php
> that I made so all I would need is a way to bring it to cocoa, the
> regex is /<a\s[^> ]*href=(\"??)([^\" > ]*?)\\1[^> ]*> .*<\/a> /siU
<http://www.google.com/search?client=safari&rls=en-au&q=regexp+cocoa
&ie=UTF-8&oe=UTF-8>
gives
<http://www.cocoadev.com/index.pl?RegularExpressions>
Which lists lots of information.
I've used RegexKitLite which works well on Mac OS X. RegexKit
appears to have forked in to two variants:
The original RegexKit which does not seem to be getting further
development (since the start of 2008) which uses PCRE 7.6
And the newer RegexKitLite which appears to be getting the bulk of
development now, which uses the ICU library which is shipped with Mac
OS X (but not public on iPhone).
Enjoy,
Peter.
--
Keyboard Maestro 3 Now Available!
Now With Status Menu triggers!
Keyboard Maestro <http://www.keyboardmaestro.com/> Macros for your Mac
<> <<A href="http://download.stairways.com/">http://download.stairways.com/> -
I've found RegexKit but I couldn't figure out how to get an array from
my string.
On Nov 17, 2008, at 7:43 PM, Andrew Farmer wrote:> On 17 Nov 08, at 17:04, Mr. Gecko wrote:
>> Hello I have been trying to find a good Regex framework for cocoa.
>> I am trying to find urls in an html page...
>
> Assuming that you're loading the web page into a WebView or similar,
> you'll have a much easier time doing this through the HTML DOM.
> Trying to parse HTML with regular expressions is risky, as there are
> numerous edge cases which are easy to miss. -
I never was able to compile RegexKitLite for some reason, and when I
use the framework it says warning: 'NSString' may not respond to '-
arrayByMatchingObjectsWithRegex:' and when I run the code it gives me
this in the debug
*** -[NSCFString arrayByMatchingObjectsWithRegex:]: unrecognized
selector sent to instance 0x872c00
Any help?
Thanks,
Mr. Gecko
On Nov 17, 2008, at 10:09 PM, <cocoa-dev-request...> wrote:
> >>
> gives
>
> <http://www.cocoadev.com/index.pl?RegularExpressions>
>
> Which lists lots of information.
>
> I've used RegexKitLite which works well on Mac OS X. RegexKit
> appears to have forked in to two variants:
>
> The original RegexKit which does not seem to be getting further
> development (since the start of 2008) which uses PCRE 7.6
>
> And the newer RegexKitLite which appears to be getting the bulk of
> development now, which uses the ICU library which is shipped with
> Mac OS X (but not public on iPhone).
>
> Enjoy,
> Peter. -
To get it to compile you need to do two things:
1. Add the "Other Linker Flag" "-licucore" to your project build
settings
2. Import the RKL header into whatever files you'll use it in
(alternatively, you can import it into your .pch file so that it will
get included into everything automatically)
HTH,
Dave
On 17 Nov, 2008, at 10:33 PM, Mr. Gecko wrote:> I never was able to compile RegexKitLite for some reason, and when I
> use the framework it says warning: 'NSString' may not respond to '-
> arrayByMatchingObjectsWithRegex:' and when I run the code it gives
> me this in the debug
> *** -[NSCFString arrayByMatchingObjectsWithRegex:]: unrecognized
> selector sent to instance 0x872c00
> Any help?
>
> Thanks,
> Mr. Gecko -
Here is what I am trying now.
NSString *recived = [[NSString alloc] initWithData:receivedData
encoding:NSUTF8StringEncoding];
RKRegex *regex = [RKRegex regexWithRegexString:@"/<a\\s[^>]*href=(\"??)
([^\" >]*?)\\1[^>]*>.*<\\/a>/siU" options:RKCompileNoOptions];
RKEnumerator *recivedE = [recived matchEnumeratorWithRegex:regex];
while([recivedE nextRanges] != NULL) {
NSRange matchRange = [recivedE currentRange];
NSString *link = [recived substringWithRange:matchRange];
NSLog(@"%@", link);
}
It doesn't work. I get this in the debug terminal.
CFPropertyListCreateFromXMLData(): Old-style plist parser: missing
semicolon in dictionary.
0xffe48 [RKRegex regexWithRegexString:options:]: (formatString is NULL)
I don't know what can be happening or why they don't have example code.
Thanks for any help,
Mr. Gecko
On Nov 17, 2008, at 11:33 PM, Mr. Gecko wrote:> I never was able to compile RegexKitLite for some reason, and when I>> >
> use the framework it says warning: 'NSString' may not respond to '-
> arrayByMatchingObjectsWithRegex:' and when I run the code it gives
> me this in the debug
> *** -[NSCFString arrayByMatchingObjectsWithRegex:]: unrecognized
> selector sent to instance 0x872c00
> Any help?
>
> Thanks,
> Mr. Gecko
>
> On Nov 17, 2008, at 10:09 PM, <cocoa-dev-request...> wrote:
>
>> <http://www.google.com/search?client=safari&rls=en-au&q=regexp+cocoa
&ie=UTF-8&oe=UTF-8>>
>> gives
>>
>> <http://www.cocoadev.com/index.pl?RegularExpressions>
>>
>> Which lists lots of information.
>>
>> I've used RegexKitLite which works well on Mac OS X. RegexKit
>> appears to have forked in to two variants:
>>
>> The original RegexKit which does not seem to be getting further
>> development (since the start of 2008) which uses PCRE 7.6
>>
>> And the newer RegexKitLite which appears to be getting the bulk of
>> development now, which uses the ICU library which is shipped with
>> Mac OS X (but not public on iPhone).
>>
>> Enjoy,
>> Peter.
> -
so I tried that and it worked but it doesn't seem to do what I need.
I am needing to do the same thing as preg_match_all in php so I can
find all links and have it in an NSArray to go through and add to a
database. Any ideas on how I can do that?
On Nov 18, 2008, at 11:59 AM, <cocoa-dev-request...> wrote:> To get it to compile you need to do two things:
> 1. Add the "Other Linker Flag" "-licucore" to your project build
> settings
> 2. Import the RKL header into whatever files you'll use it in
> (alternatively, you can import it into your .pch file so that it
> will get included into everything automatically)
>
> HTH,
>
> Dave -
I should've clarified that those two things are to get the
RegexKitLite additions working. When I needed the enumeration, I just
followed the steps on the docs page to "Creating a Match Enumerator".
It basically has you copy and paste some stuff into new files, since
the RKL didn't include it by default. That has worked for me.
Alternatively, you could use the RKL addition to NSString to create an
NSArray using "componentsSeparatedByRegex", and then enumerate over
the array (you'd even get fast enumeration if you're using Leopard).
That might give you a different approach to parsing out links.
Dave
On Nov 18, 2008, at 11:35 AM, Mr. Gecko wrote:> so I tried that and it worked but it doesn't seem to do what I need.
> I am needing to do the same thing as preg_match_all in php so I can
> find all links and have it in an NSArray to go through and add to a
> database. Any ideas on how I can do that?
>
> On Nov 18, 2008, at 11:59 AM, <cocoa-dev-request...> wrote:
>
>> To get it to compile you need to do two things:
>> 1. Add the "Other Linker Flag" "-licucore" to your project build
>> settings
>> 2. Import the RKL header into whatever files you'll use it in
>> (alternatively, you can import it into your .pch file so that it
>> will get included into everything automatically)
>>
>> HTH,
>>
>> Dave
> -
NSPredicate can handle ICU standard regex matches.
More info in the NSPredicate docs.
The following snippet validates a UUID.
/*
is UUID
see http://www.stiefels.net/2007/01/24/regular-expressions-for-nsstring/
*/
- (BOOL)isUUID
{
NSString *regex = @"^(([0-9a-fA-F]){8}-([0-9a-fA-F]){4}-([0-9a-fA-F])
{4}-([0-9a-fA-F]){4}-([0-9a-fA-F]){12})$";
// supported non standard regex format is at http://www.icu-project.org/userguide/regexp.html
NSPredicate *regextest = [NSPredicate predicateWithFormat:@"SELF
MATCHES %@", regex];
return [regextest evaluateWithObject:self];
}
Jonathan Mitchell
Central Conscious Unit
http://www.mugginsoft.com


