Reading a binary file format?

  • Hi,

    I need to read what I assume is a binary file into my
    program. I know where to expect the various parts of
    the file, I'm just not sure how to read them in -
    probably because I'm self-taught at Cocoa (Kochan &
    Hillegass) and it requires a lower level of C, I'm not
    sure.

    So, suppose you have a file that contains some text
    but also some (presumably binary) integer information.
    If you open it up in TextEdit, you can see the text,
    but there are some invisible characters that must be
    the binary information that is unrecognised by
    TextEdit. How would you go about reading it into a
    Cocoa program? For instance, say the file has the
    integers 'SCLT'  and 0 as part of its header, and some
    integers after it, and then 'PLST', and then some
    integers and then bytes representing text data. In
    TextEdit, you would see "SCLT PLST" followed by the
    text. How would I go about reading all of the
    information and accessing the integers hidden away in
    there etc? Can I do this with NSData or NSString
    methods, or do I need to delve deeper into C?

    I hope that is not too vague a question - I have a
    file format I need to read (but it is private so I
    can't post it) but don't quite know where to begin.

    Many thanks and all the best,
    Keith

          ____________________________________________________________________________________
    Fussy? Opinionated? Impossible to please? Perfect.  Join Yahoo!'s user panel and lay it on us. http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7
  • On Oct 11, 2007, at 4:15 PM, Keith Blount wrote:

    > So, suppose you have a file that contains some text
    > but also some (presumably binary) integer information.
    > If you open it up in TextEdit, you can see the text,
    > but there are some invisible characters that must be
    > the binary information that is unrecognised by
    > TextEdit. How would you go about reading it into a
    > Cocoa program? For instance, say the file has the
    > integers 'SCLT'  and 0 as part of its header, and some
    > integers after it, and then 'PLST', and then some
    > integers and then bytes representing text data. In
    > TextEdit, you would see "SCLT PLST" followed by the
    > text. How would I go about reading all of the
    > information and accessing the integers hidden away in
    > there etc? Can I do this with NSData or NSString
    > methods, or do I need to delve deeper into C?

    The easiest way to do this is to get the contents of the file into an
    NSData, for example via dataWithContentsOfFile:.  You can then obtain
    the contents of any particular byte in the file as e.g. *(uint8_t *)
    ([data bytes] + offset).  If the value you are interested in reading
    is an number that is longer than a single byte, you must take
    endianness into account, for example using the functions in
    NSByteOrder.h.  If you wish to extract strings from the file, you can
    use methods like initWithBytes:length:encoding:.  Bear in mind that
    unfortunately a great many binary file formats are quite cavalier
    about the encoding used for any embedded strings.

    There are significant pitfalls involved in reading binary file
    formats.  In particular, any values encountered in the file must be
    treated with extreme caution; all offsets should be sanity-checked
    against the length of the file, calculations should be checked for
    overflow or underflow, values that are supposed to lie in a certain
    range should be tested, and so forth.  Failure to do this is an
    abundant source of security holes if you are unlucky, or simple
    crashers if you are lucky.  For every value you read from the file,
    ask yourself what would happen if some malicious person went in and
    changed those bytes in the file to some nonsensical value--because
    there are people out there who will do that.  I would advise at the
    very least a thorough understanding of C pointers and pointer
    arithmetic.

    Douglas Davidson
  • Am 12.10.2007 um 01:50 schrieb Douglas Davidson:
    > For every value you read from the file, ask yourself what would
    > happen if some malicious person went in and changed those bytes in
    > the file to some nonsensical value--because there are people out
    > there who will do that.  I would advise at the very least a
    > thorough understanding of C pointers and pointer arithmetic.

      I guess this is the point where I shamelessly plug my C tutorial
    again:

    <http://www.zathras.de/angelweb/masters-of-the-void.htm>

    It takes care to introduce people to pointers and most of the other C
    stuff that you'll have to understand to do anything but the most
    simple Cocoa program, and is perfectly suitable for cases like this.

    Cheers,
    -- M. Uli Kusterer
    http://www.zathras.de
  • Keith Blount wrote:
    > I need to read what I assume is a binary file into my
    > program. I know where to expect the various parts of
    > the file, I'm just not sure how to read them in -
    > probably because I'm self-taught at Cocoa (Kochan &
    > Hillegass) and it requires a lower level of C, I'm not
    > sure.
    >
    > So, suppose you have a file that contains some text
    > but also some (presumably binary) integer information.
    > If you open it up in TextEdit, you can see the text,
    > but there are some invisible characters that must be
    > the binary information that is unrecognised by
    > TextEdit. How would you go about reading it into a
    > Cocoa program? For instance, say the file has the
    > integers 'SCLT'  and 0 as part of its header, and some
    > integers after it, and then 'PLST', and then some
    > integers and then bytes representing text data. In
    > TextEdit, you would see "SCLT PLST" followed by the
    > text. How would I go about reading all of the
    > information and accessing the integers hidden away in
    > there etc? Can I do this with NSData or NSString
    > methods, or do I need to delve deeper into C?
    >

    Do you have the original C code, or some idea of the record structure?

    If you do, the file is probably going to be trivial to fairly trivial to
    read in C.  The original programmer has probably issued a fopen() and a
    series of fwrite() calls of C record structures, then fclose() -- or a
    similar sequence from the other I/O calls available in the C standard
    library.  You'd use fopen(), and a series of fread() calls processing
    each as a C record structure into your application until feof() lights
    up, followed by an fclose().

    Depending on the requirements and the file structures, you might end up
    using fscanf() or such to process the input as you read it in.

    If the structures are not fixed, things get a bit more interesting.  Pun
    intended.  I've dealt with C programs that process the bytes in smaller
    units; where the records are variable-length, and are dependent on the
    run-time context.  This processing is made involved and more difficult
    if you don't have the source code and/or the data definitions; if you're
    reverse-engineering the files.

    It's also possible to see a combination of fixed and variable data,
    where the file contains a linefeed termination, or where there's a fixed
    header for each records and a count of bytes.

    One wrinkle here: you'll also need to know whether the data is
    little-endian, or big-endian; what the byte order is.  If you know the
    host system that generated the file, you can usually figure that out.
    About half the systems around are big-endian, and the others are
    little-endian.  (And then there's the PDP-11, but I digress.)

    And another wrinkle: some compilers can insert pad bytes into the data.
    The compiler aligns a longword within a structure at a longword
    boundary; at an address that ends in %x0, %x4, %x8 or %xc.  This can
    mean that padding bytes -- unused bytes -- are inserted into the data.
    Most compilers do not do this, but I've encountered some that do.  It's
    possible for a programmer to disable this alignment, and pack the record
    structures.  It's also possible that the programmer didn't realize the
    compiler inserted this data, so what you see in source code might not
    match what you see in the file dump.

    To poke around inside the file itself, use the shell od (octal dump) or
    hexdump commands.  This will show you the bytes and byte counts, from
    which the underlying structures can often be discerned.

    In summary, there are a whole pile of different ways to write a binary
    file in C.  I'd not expect to be able to read a random C data file
    directly in Cocoa without using some C code; without going into more
    effort than reading it using C and converting the fields and (I assume)
    records over into Cocoa structures as each is read in.

    Stephen Hoffman
  • I wrote a set of Cocoa classes that parse out binary or ascii files
    based on a file/record structure that you define in XML using a
    fairly small but powerful 'grammar'. The results for sections of
    'records/fields' that you define are handed back to you in a
    dictionary (or an array of dictionaries if you define 'repeat'
    sections).

    It also has a small 'test' app that lets you muck about with the XML
    rules on-the-fly, so you get immediate feedback on where to tweak the
    structure definition.

    It handles endian-ness, integer types and doubles, padding, searching
    for starts/ends/boundaries via strings or character sets, number
    arrays, plain strings, skipping etc - and uses the standard C lib calls.

    If you want to give it a whirl, let me know (probably OT, so direct
    mail). I was thinking about sprucing it up for anyone to download,
    but haven't got around to writing 'release quality' documentation yet.

    Rob

    On Oct 12, 2007, at 9:50 AM, Stephen Hoffman wrote:

    >
    > Keith Blount wrote:
    >> I need to read what I assume is a binary file into my
    >> program. I know where to expect the various parts of
    >> the file, I'm just not sure how to read them in -
    >> probably because I'm self-taught at Cocoa (Kochan &
    >> Hillegass) and it requires a lower level of C, I'm not
    >> sure.
    >>
    >> <snips...>
    >
    > To poke around inside the file itself, use the shell od (octal
    > dump) or hexdump commands.  This will show you the bytes and byte
    > counts, from which the underlying structures can often be discerned.
    >
    > In summary, there are a whole pile of different ways to write a
    > binary file in C.  I'd not expect to be able to read a random C
    > data file directly in Cocoa without using some C code; without
    > going into more effort than reading it using C and converting the
    > fields and (I assume) records over into Cocoa structures as each is
    > read in.
  • Thanks for the replies so far - and thank you Uli for
    the tutorial. I should perhaps clarify that I'm not a
    complete newbie (although I am compared to you guys :)
    ) - I'm the author of a relatively popular and
    complicated piece of Cocoa shareware, so I would hate
    anyone to come across this thread and think, "He
    doesn't know what he's doing!". :) So, although your
    tutorial looks brilliant, I am familiar with those
    aspects of C. One of the beauties of Objective-C,
    however, is that you can write rather complex pieces
    of software without worrying about too much low-level
    C stuff. Sure, I use some C arrays, structures and
    stuff here and there, but when it comes to reading in
    bytes from a binary file I fully admit its beyond my
    comfort zone.

    Robert - I will contact you off-list because I would
    definitely be interested in your Cocoa classes.

    Douglas and Stephen - thank you for the explanations,
    much appreciated.

    Once again, many thanks for the replies so far, I
    really appreciate your taking the time to help a very
    baffled man.

    All the best,
    Keith

    --- Uli Kusterer <witness.of.teachtext...> wrote:

    > Am 12.10.2007 um 01:50 schrieb Douglas Davidson:
    >> For every value you read from the file, ask
    > yourself what would
    >> happen if some malicious person went in and
    > changed those bytes in
    >> the file to some nonsensical value--because there
    > are people out
    >> there who will do that.  I would advise at the
    > very least a
    >> thorough understanding of C pointers and pointer
    > arithmetic.
    >
    > I guess this is the point where I shamelessly plug
    > my C tutorial
    > again:
    >
    >
    >
    <http://www.zathras.de/angelweb/masters-of-the-void.htm>
    >
    > It takes care to introduce people to pointers and
    > most of the other C
    > stuff that you'll have to understand to do anything
    > but the most
    > simple Cocoa program, and is perfectly suitable for
    > cases like this.
    >
    > Cheers,
    > -- M. Uli Kusterer
    > http://www.zathras.de
    >
    >
    >
    >


    ____________________________________________________________________________________
    Boardwalk for $500? In 2007? Ha! Play Monopoly Here and Now (it's updated for today's economy) at Yahoo! Games.
    http://get.games.yahoo.com/proddesc?gamekey=monopolyherenow
  • Am 12.10.2007 um 21:03 schrieb Keith Blount:
    > Sure, I use some C arrays, structures and
    > stuff here and there, but when it comes to reading in
    > bytes from a binary file I fully admit its beyond my
    > comfort zone.

      I was thinking mainly of the chapter on memory management and
    dynamic arrays in there, and the article on how memory management
    works linked from that. That talks about how bytes are reinterpreted,
    and how to access individual bytes. As I don't know the skills of
    most people on this list, I just thought I'd mention this stuff is
    there.

      But you gave me a nice idea, reading and writing files is a topic
    not covered there at all, and at least in general terms, it should
    be. Thanks for the inspiration!

    Cheers,
    -- M. Uli Kusterer
    http://www.zathras.de
previous month october 2007 next month
MTWTFSS
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31        
Go to today