How to get hyperlink in a html file?

  • Hi to all,

    If this is a stupid question, I apologize in advance, though I am pretty
    new at this Cocoa stuff.

    I try to get the content of a html file, if some of the contents is like
    this:

    <tr>
    <td>
    <p><font face="Lucida Grande,Arial, sans-serif"><a
    href="../pgs/557.html">About import settings and hard disk
    space</a></font></p>
    </td>
    </tr>
    <tr>
    <td>
    <p><font face="Lucida Grande,Arial, sans-serif"><a
    href="../pgs/629.html">About AIFF and WAV custom import
    options</a></font></p></td></tr>

    how to search the hyperlink string? such as:

    get the string "About import settings and hard disk space".

    get the string "../pgs/557.html" and convert it to absolute path.

    hope someone can help me. thanks.

    Leo
  • Well you could use NSScanner to scan up to any instances of:

    <a

    and then read in the URL from there.  Not entirely sure it's the best
    method though.

    Mike.

    On 18 Oct 2006, at 04:57, Leo wrote:

    > Hi to all,
    >
    > If this is a stupid question, I apologize in advance, though I am
    > pretty
    > new at this Cocoa stuff.
    >
    > I try to get the content of a html file, if some of the contents is
    > like
    > this:
    >
    > <tr>
    > <td>
    > <p><font face="Lucida Grande,Arial, sans-serif"><a
    > href="../pgs/557.html">About import settings and hard disk
    > space</a></font></p>
    > </td>
    > </tr>
    > <tr>
    > <td>
    > <p><font face="Lucida Grande,Arial, sans-serif"><a
    > href="../pgs/629.html">About AIFF and WAV custom import
    > options</a></font></p></td></tr>
    >
    >
    > how to search the hyperlink string? such as:
    >
    > get the string "About import settings and hard disk space".
    >
    > get the string "../pgs/557.html" and convert it to absolute path.
    >
    > hope someone can help me. thanks.
    >
    > Leo
    >
    >
    >
    >
    > _______________________________________________
    > Do not post admin requests to the list. They will be ignored.
    > Cocoa-dev mailing list      (<Cocoa-dev...>)
    > Help/Unsubscribe/Update your Subscription:
    > http://lists.apple.com/mailman/options/cocoa-dev/mike.abdullah%
    > 40gmail.com
    >
    > This email sent to <mike.abdullah...>
  • I think the only way to do this and have it work with arbitrary web
    pages is to  parse the DOM and look for links that way.
    You can't cleanly scan for text given an arbitrary HTML file, since
    JavaScript is able to modify the DOM structure at will (or even
    document.write in a whole new section of the page!). Using JavaScript to
    construct a complicated page is not too uncommon.
    I don't know if WebKit provides the tools to analyze the DOM from inside
    Obj-C. If not, you may have to use JavaScript :(

    Mike Abdullah wrote:
    > Well you could use NSScanner to scan up to any instances of:
    >
    > <a
    >
    > and then read in the URL from there.  Not entirely sure it's the best
    > method though.
    >
    > Mike.
    >
    > On 18 Oct 2006, at 04:57, Leo wrote:
    >
    >> Hi to all,
    >>
    >> If this is a stupid question, I apologize in advance, though I am pretty
    >> new at this Cocoa stuff.
    >>
    >> I try to get the content of a html file, if some of the contents is like
    >> this:
    >>
    >> <tr>
    >> <td>
    >> <p><font face="Lucida Grande,Arial, sans-serif"><a
    >> href="../pgs/557.html">About import settings and hard disk
    >> space</a></font></p>
    >> </td>
    >> </tr>
    >> <tr>
    >> <td>
    >> <p><font face="Lucida Grande,Arial, sans-serif"><a
    >> href="../pgs/629.html">About AIFF and WAV custom import
    >> options</a></font></p></td></tr>
    >>
    >>
    >> how to search the hyperlink string? such as:
    >>
    >> get the string "About import settings and hard disk space".
    >>
    >> get the string "../pgs/557.html" and convert it to absolute path.
    >>
    >> hope someone can help me. thanks.
    >>
    >> Leo
    >>
    >>
    >>
    >>
    >> _______________________________________________
    >> Do not post admin requests to the list. They will be ignored.
    >> Cocoa-dev mailing list      (<Cocoa-dev...>)
    >> Help/Unsubscribe/Update your Subscription:
    >> http://lists.apple.com/mailman/options/cocoa-dev/<mike.abdullah...>
    >>
    >>
    >> This email sent to <mike.abdullah...>
    >
    > _______________________________________________
    > Do not post admin requests to the list. They will be ignored.
    > Cocoa-dev mailing list      (<Cocoa-dev...>)
    > Help/Unsubscribe/Update your Subscription:
    > http://lists.apple.com/mailman/options/cocoa-dev/<jstiles...>
    >
    > This email sent to <jstiles...>
  • Thanks Mike and John, but can you give me some details to do this? maybe
    i can get the URL with using NSScaner, but how to get the string between
    '<a href=...>xxxx</a>'? usually there is some other html tag inside <a
    > tag, such as '<a herf=...> <font color> <b> xxxx</b> </font> </a> ', how
    scan it?

    by the way, does anybody know how to use stringWithContentsOfFile:
    usedEncoding: error:
    i want use it to get the content of html file, but how to detect the
    encoding? when i try it like this:

    NSString *stringFromFileAtPath=[NSString
    stringWithContentsOfFile:filePath usedEncoding:NSUTF8StringEncoding
    error:&error];

    the Xcode displays after build:" warning: passing argument 2 of
    stringWithContentsOfFile: usedEncoding: error:  makes pointer from
    integer without a cast"

    but no warning when using stringWithContentsOfFile: encoding: error:

    both of these 2 methods can't open some Chinese and Japanese HTML file.

    anybody can tell me how to detect the encoding in order to get the
    content of html?

    thank you very much.

    Leo

    John Stiles wrote:
    > I think the only way to do this and have it work with arbitrary web
    > pages is to  parse the DOM and look for links that way.
    > You can't cleanly scan for text given an arbitrary HTML file, since
    > JavaScript is able to modify the DOM structure at will (or even
    > document.write in a whole new section of the page!). Using JavaScript
    > to construct a complicated page is not too uncommon.
    > I don't know if WebKit provides the tools to analyze the DOM from
    > inside Obj-C. If not, you may have to use JavaScript :(
    >
    > Mike Abdullah wrote:
    >> Well you could use NSScanner to scan up to any instances of:
    >>
    >> <a
    >>
    >> and then read in the URL from there.  Not entirely sure it's the best
    >> method though.
    >>
    >> Mike.
    >>
    >> On 18 Oct 2006, at 04:57, Leo wrote:
    >>
    >>> Hi to all,
    >>>
    >>> If this is a stupid question, I apologize in advance, though I am
    >>> pretty
    >>> new at this Cocoa stuff.
    >>>
    >>> I try to get the content of a html file, if some of the contents is
    >>> like
    >>> this:
    >>>
    >>> <tr>
    >>> <td>
    >>> <p><font face="Lucida Grande,Arial, sans-serif"><a
    >>> href="../pgs/557.html">About import settings and hard disk
    >>> space</a></font></p>
    >>> </td>
    >>> </tr>
    >>> <tr>
    >>> <td>
    >>> <p><font face="Lucida Grande,Arial, sans-serif"><a
    >>> href="../pgs/629.html">About AIFF and WAV custom import
    >>> options</a></font></p></td></tr>
    >>>
    >>>
    >>> how to search the hyperlink string? such as:
    >>>
    >>> get the string "About import settings and hard disk space".
    >>>
    >>> get the string "../pgs/557.html" and convert it to absolute path.
    >>>
    >>> hope someone can help me. thanks.
    >>>
    >>> Leo
    >>>
    >>>
    >>>
    >>>
    >>
    >
previous month october 2006 next month
MTWTFSS
            1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31          
Go to today