Problem parsing file in 64 bit build.

  • Hi,

    I'm having an issue with parsing a file that contains structures that are defined with type 'double'. These are 64-bit doubles, in little endian format. In my code, I have typdefs of the various structures and sub-structures within the file that are defined using fixed size types - SInt32, UInt32, Float64 and so on.

    I'm using the relevant swapping routines to read the file into my local format. Everything is fine when I compile in 32-bit, but fails to work in 64 bit. It's acting as if the fields in the structures are misaligned.

    Could this be caused by the 64 bit compiler padding out my structures in a different way for 64-bit builds? If so is there a way to tell it not to for these types?

    Here's some example code:

    typedef struct
    {
    SInt32  fileCode;  // 9994, big endian
    SInt32  unused[5];
    UInt32  fileLength;  // big endian, length expressed as # of 16-bit words, including header
    SInt32  version;    // little endian, = 1000
    UInt32  shapeType;  // all little endian from here on
    Float64  xMin;    // bbox
    Float64  yMin;
    Float64  xMax;
    Float64  yMax;
    Float64  zMin;
    Float64  zMax;
    Float64  mMin;
    Float64  mMax;
    }
    ESRIFileHeader, *ESRIFileHeaderPtr;

    typedef struct
    {
    SInt32  recordNumber;  // big endian
    UInt32  contentlength;  // big endian, length expressed as # of 16-bit words
    SInt32  shapeType;  // little endian
    }
    ESRIRecordHeader, *ESRIRecordHeaderPtr;

    // this fragment of code is where I do the pointer arithmetic:

      p = (unsigned char*)[_data bytes];
      eof = p + [_data length];
      p += sizeof( ESRIFileHeader );

      while( !done && !mShouldAbort )
      {
      // parse the record header

      recP = (ESRIRecordHeaderPtr) p;
      recNum = (NSInteger)CFSwapInt32BigToHost(recP->recordNumber);
      recLength = (NSUInteger)CFSwapInt32BigToHost(recP->contentlength) * 2;
      shapeType = (NSInteger)CFSwapInt32LittleToHost(recP->shapeType);

      NSLog( @"record #%ld, length=%lu, type=%ld", recNum, recLength, shapeType );  // displays garbage values in 64-bit build

    In the file, the header is followed by a series of records, each of which has the header structure shown, and a variable length data portion. I use pointer arithmetic to skip past the header, using the sizeof(ESRIFileHeader) to compute the size. Even from the very first record, the following header is misaligned, indicating that sizeof() gives a different result for 32 versus 64 bit builds. Indeed, I get 100 in 32-bits, and 104 in 64-bit builds for the header structure.

    That's bad enough, but the values in the Float64 fields of the header appear garbage in the debugger in the 64-bit build, so even though I believe I'm swapping them correctly, the values returned in the 64-bit version are nonsense. I'm less clear about why that should be, but again it could be padding - if the padding is not at the end of the structure but somewhere before the float values.

    Basically, I need to know if there's a way to get the 64-bit compiler to treat these structures the same way as it does in a 32-bit build.

    --Graham
  • On May 6, 2012, at 22:51 , Graham Cox wrote:

    > Could this be caused by the 64 bit compiler padding out my structures in a different way for 64-bit builds?

    Yes, I believe in 64-bit builds 8-byte data types are 8-byte aligned, while they're only 4-bit aligned in 32-bit builds -- or something like that.

    > If so is there a way to tell it not to for these types?

    Try this:

    typedef struct {
      ...
    } __attribute__ ((packed)) MyStruct;
  • On May 6, 2012, at 23:30 , Quincey Morris wrote:

    > while they're only 4-bit aligned

    Er … 4-byte aligned.

    Also, be careful when debugging these with lldb. At some versions of lldb get the member offsets and struct size wrong in packed structs, so debugger member display is wrong, and so is offset/size-related pointer arithmetic in debugger expressions.
  • Thanks Quincey, it set me on the right track,

    I used #pragma pack(4) for these structures (with relevant push and pops so it's isolated to this case only) and it all started working fine.

    Is using the syntax you suggested considered better form than #pragma,?

    --Graham

    On 07/05/2012, at 4:30 PM, Quincey Morris wrote:

    > On May 6, 2012, at 22:51 , Graham Cox wrote:
    >
    >> Could this be caused by the 64 bit compiler padding out my structures in a different way for 64-bit builds?
    >
    > Yes, I believe in 64-bit builds 8-byte data types are 8-byte aligned, while they're only 4-bit aligned in 32-bit builds -- or something like that.
    >
    >> If so is there a way to tell it not to for these types?
    >
    > Try this:
    >
    > typedef struct {
    > ...
    > } __attribute__ ((packed)) MyStruct;
    >
    >
  • On May 7, 2012, at 00:22 , Graham Cox wrote:

    > Is using the syntax you suggested considered better form than #pragma,?

    I have a vague recollection that #pragma pack was already deprecated in GCC, so __attribute__ ((packed)) might be the more compatible choice, but finding trustworthy, non-out-of-date documentation for compiler options can be a bit of a challenge.
  • On May 7, 2012, at 2:28 AM, Quincey Morris wrote:

    > On May 7, 2012, at 00:22 , Graham Cox wrote:
    >
    >> Is using the syntax you suggested considered better form than #pragma,?
    >
    > I have a vague recollection that #pragma pack was already deprecated in GCC, so __attribute__ ((packed)) might be the more compatible choice, but finding trustworthy, non-out-of-date documentation for compiler options can be a bit of a challenge.

    Myself, I like to just spin off a method or function that takes a chunk of data and populates the fields of the struct one by one, instead of writing the data straight onto the struct. A little more code, but you know it’s going to work right without any surprises.

    Charles
  • On Mon, May 7, 2012 at 8:06 AM, Charles Srstka <cocoadev...> wrote:
    > Myself, I like to just spin off a method or function that takes a chunk of data and populates the fields of the struct one by one, instead of writing the data straight onto the struct. A little more code, but you know it’s going to work right without any surprises.

    Not only that, but writing structs to file handles has caused security
    problems before. Consider what happens if you have a short or byte
    field and the compiler pads the struct. Now there's memory in your
    struct that never gets initialized. If you write that to a file or
    socket you're sending whatever might have been lurking there.
    Passwords, login details...
  • Understood, but in this case I'm not writing anything. I'm reading a file into a NSData, and using these structs to put a frame onto the data in places so I can extract data from the fields.

    The alternative would be to pull out each data field one by one, which I'm sure is also considered acceptable practice, but for this file type, using structs has proved to be a lot easier, not least because of a) the strange mix of big-endian and little-endian values in the same file and b) the presence of directly formatted 'double' values that are not platform independent.

    Some discussion of the merit of #pragma pack(n) versus other methods would be useful here, it's not something I've had to deal with very much.

    --Graham

    On 08/05/2012, at 3:43 AM, Stephen J. Butler wrote:

    > On Mon, May 7, 2012 at 8:06 AM, Charles Srstka <cocoadev...> wrote:
    >> Myself, I like to just spin off a method or function that takes a chunk of data and populates the fields of the struct one by one, instead of writing the data straight onto the struct. A little more code, but you know it’s going to work right without any surprises.
    >
    > Not only that, but writing structs to file handles has caused security
    > problems before. Consider what happens if you have a short or byte
    > field and the compiler pads the struct. Now there's memory in your
    > struct that never gets initialized. If you write that to a file or
    > socket you're sending whatever might have been lurking there.
    > Passwords, login details...
  • On May 7, 2012, at 4:35 PM, Graham Cox wrote:

    > The alternative would be to pull out each data field one by one, which I'm sure is also considered acceptable practice, but for this file type, using structs has proved to be a lot easier, not least because of a) the strange mix of big-endian and little-endian values in the same file and b) the presence of directly formatted 'double' values that are not platform independent.

    I agree; using structs for this results in the cleanest code.

    > Some discussion of the merit of #pragma pack(n) versus other methods would be useful here, it's not something I've had to deal with very much.

    Assuming this just needs to support Mac and/or iOS, go ahead and use #pragma pack. If your code really has to be fully cross-platform, then things get dicier because #pragma pack isn’t supported by all compilers.

    —Jens
previous month may 2012 next month
MTWTFSS
  1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31      
Go to today