Detecting new line character(s) in text file?

  • My application deals with text files... each line has to be parsed so I
    split it with

    [str componentsSeparatedByString:@"\n"]

    however..this only works if I have a text file that was saved on a
    mac... Is there an easy way to determine which new line character is
    used in a textfile (eighther linux, mac or win) and then use this for
    parsing afterward?
  • On Nov 17, 2007, at 2:51 PM, Jean-Nicolas Jolivet wrote:

    > however..this only works if I have a text file that was saved on a
    > mac... Is there an easy way to determine which new line character is
    > used in a textfile (eighther linux, mac or win) and then use this
    > for parsing afterward?

    You'll have to do it heuristically.  Probably the easiest method is to
    read the first kilobyte or so of a file, and scan for \r.  If you find
    one, look at the next character and see if it's a \n.  If so, it's a
    DOS-format (\r\n) file.  If not, it's an old-style Mac (\r) text
    file.  If you don't see a \r in the beginning of the file at all,
    assume it's a new-style Mac or Linux/Unix file (\n).

    Another way would be to shell out (via NSTask) to the "file" command,
    which will tell you the kind of file (eg, "ASCII text, with CRLF line
    terminators").  However, that's fairly expensive, so I'd be careful if
    you're doing it frequently.  Also, you'll have to parse file's output,
    and keep up with that if it changes.

    --John
  • On Nov 17, 2007, at 2:51 PM, Jean-Nicolas Jolivet wrote:

    > My application deals with text files... each line has to be parsed
    > so I split it with
    >
    > [str componentsSeparatedByString:@"\n"]
    >
    > however..this only works if I have a text file that was saved on a
    > mac... Is there an easy way to determine which new line character is
    > used in a textfile (eighther linux, mac or win) and then use this
    > for parsing afterward?

    What are you trying? (not how you are currently trying to do it)

    Consider -[NSString getLineStart:end:contentsEnd:forRange:]

    Review...
    <http://developer.apple.com/documentation/Cocoa/Reference/Foundation/Classes
    /NSString_Class/Reference/NSString.html#//apple_ref/doc/uid/20000154-DontLi
    nkElementID_25
    >

    -Shawn
  • I am parsing subtitles file...(text subtitles for divx movies etc..)

    Depending on the format they need to be parsed differently, but in any
    cases, I need to find out what linebreak character is used so I can
    parse them...

    Shawn Erickson wrote:
    >
    > On Nov 17, 2007, at 2:51 PM, Jean-Nicolas Jolivet wrote:
    >
    >> My application deals with text files... each line has to be parsed so
    >> I split it with
    >>
    >> [str componentsSeparatedByString:@"\n"]
    >>
    >> however..this only works if I have a text file that was saved on a
    >> mac... Is there an easy way to determine which new line character is
    >> used in a textfile (eighther linux, mac or win) and then use this for
    >> parsing afterward?
    >
    > What are you trying? (not how you are currently trying to do it)
    >
    > Consider -[NSString getLineStart:end:contentsEnd:forRange:]
    >
    > Review...
    > <http://developer.apple.com/documentation/Cocoa/Reference/Foundation/Classes
    /NSString_Class/Reference/NSString.html#//apple_ref/doc/uid/20000154-DontLi
    nkElementID_25
    >
    >
    >
    > -Shawn
    >
  • On Nov 17, 2007, at 3:29 PM, Jean-Nicolas Jolivet wrote:

    > I am parsing subtitles file...(text subtitles for divx movies etc..)
    >
    > Depending on the format they need to be parsed differently, but in
    > any cases, I need to find out what linebreak character is used so I
    > can parse them...

    Why do you need to care about the line ending? You just want the lines?

    Still sounds like -[NSString getLineStart:end:contentsEnd:forRange:]
    will do what you need without you having to care about which line
    ending is being used.

    -Shawn
  • Thank you! I am looking into it! Seems like it will do the trick..!

    Shawn Erickson wrote:
    >
    > On Nov 17, 2007, at 3:29 PM, Jean-Nicolas Jolivet wrote:
    >
    >> I am parsing subtitles file...(text subtitles for divx movies etc..)
    >>
    >> Depending on the format they need to be parsed differently, but in
    >> any cases, I need to find out what linebreak character is used so I
    >> can parse them...
    >
    > Why do you need to care about the line ending? You just want the lines?
    >
    > Still sounds like -[NSString getLineStart:end:contentsEnd:forRange:]
    > will do what you need without you having to care about which line
    > ending is being used.
    >
    > -Shawn
    >
  • On Nov 17, 2007, at 2:51 PM, Jean-Nicolas Jolivet wrote:

    > My application deals with text files... each line has to be parsed
    > so I split it with
    >
    > [str componentsSeparatedByString:@"\n"]
    >
    > however..this only works if I have a text file that was saved on a
    > mac... Is there an easy way to determine which new line character is
    > used in a textfile (eighther linux, mac or win) and then use this
    > for parsing afterward?
    >
    This is covered in the String Programming Guide...

    mmalc
  • On Nov 17, 2007, at 5:35 PM, Jean-Nicolas Jolivet wrote:

    > Thank you! I am looking into it! Seems like it will do the trick..!
    >
    > Shawn Erickson wrote:
    >>
    >> On Nov 17, 2007, at 3:29 PM, Jean-Nicolas Jolivet wrote:
    >>
    >>> I am parsing subtitles file...(text subtitles for divx movies etc..)
    >>>
    >>> Depending on the format they need to be parsed differently, but in
    >>> any cases, I need to find out what linebreak character is used so
    >>> I can parse them...
    >>
    >> Why do you need to care about the line ending? You just want the
    >> lines?
    >>
    >> Still sounds like -[NSString
    >> getLineStart:end:contentsEnd:forRange:] will do what you need
    >> without you having to care about which line ending is being used.

    And, that API also has the advantage that it doesn't just handle the
    "old-style" three end-of-line sequences; see the docs for details.

    ___________________________________________________________
    Ricky A. Sharp        mailto:<rsharp...>
    Instant Interactive(tm)  http://www.instantinteractive.com
previous month november 2007 next month
MTWTFSS
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30    
Go to today