Two Questions on NSStrings <-> bytes

  • Hello,

    I'm still learning on byte writing stuff. Not sure why it's been so
    difficult for me. Though I can code web apps, VR tools and scripts
    galore, simple byte-level writing is a mental roadblock. :) Here are
    two questions that have stumped me...

    1. One of the specs I'm coding for requires a Big Endian 8-bit hex
    value -- meant to be the length of a C-String. So, I tried the
    following...

      // value is 20

      int8_t chapterURLSizeValue = [chapURLLengthNumber intValue];
      NSString *test = [NSString stringWithFormat:@"%
    x",chapterURLSizeValue];
      NSLog(@"test string: %@", test);
      CFIndex lhsLen = [test length];
      UniChar *lhsBuf = malloc(lhsLen * sizeof(UniChar));
      [test getCharacters:lhsBuf];
      data3 = [NSMutableData dataWithCapacity:1];
      [data3  appendBytes:&lhsBuf length:1];
      free(lhsBuf);
      NSLog(@"Data 3: %@", data3);

    Result...
            test string: 14
            Data 3: <e0>

    In other words, I'm getting the byte value of the string. How do I
    get that hex NSString to a literal byte array? Would there be a
    better way to get a 8-bit hex value from an NSNumber?

    2. The C-string measured above is meant to come from a url stored in
    an NSString in a NSDictionary. To get the byte array I tried the
    following...

      // string is http://www.apple.com
      NSString *chapterURL = [resourceDict objectForKey:@"url"];
      const char *cChapterURL = [chapterURL cStringUsingEncoding:
    [NSString defaultCStringEncoding]];
      [data3  appendBytes:&cChapterURL length:strlen(cChapterURL)];

    Result...
      50628816 00000005 00000000 a0110090 01000000

    Very much incorrect. What would the proper way to get a c-string from
    a URL in an NSString?

    Overall, does anyone know of a tome that would be useful is learning
    byte-level writing and data conversion at that level?

    thanks for any direction,

    Jaime Magiera
    Sensory Research
    http://www.sensoryresearch.net
  • On 22 Oct 2007, at 17:10, Jaime Magiera wrote:

    > I'm still learning on byte writing stuff. Not sure why it's been so
    > difficult for me.

    You're probably over-thinking it.  That's the usual reason for
    problems like this.

    > Though I can code web apps, VR tools and scripts galore, simple
    > byte-level writing is a mental roadblock. :) Here are two questions
    > that have stumped me...
    >
    > 1. One of the specs I'm coding for requires a Big Endian 8-bit hex
    > value -- meant to be the length of a C-String. So, I tried the
    > following...

    It doesn't make sense for an eight-bit value to be Big Endian (or
    Little Endian).  Unless you're talking wire protocols (and only then
    if you actually get involved with the hardware itself), a byte is a
    byte is a byte.  It's only when you get into multi-byte units that
    endianness generally becomes a concern.

    Anyway, I'm unclear what you need here (and I think the reason is
    that *you're* unclear what it is that you need).  Do you want a
    binary value?  Or an ASCII string consisting of two hexadecimal
    digits?  If you need binary, e.g. in an NSData, all you need do is:

      #import <stdint.h>

      uint8_t value = 20;
      NSData *myData = [NSData dataWithBytes:&value length:1];

    (I prefer using uint8_t rather than "unsigned char", because it makes
    it clear that we're talking 8-bit bytes, not characters.  Hence the
    #import, to get that type definition.)

    If you want an NSString with the hexadecimal digits, you can do

      NSString *myString = [NSString stringWithFormat:@"%02x", value];

    If you need that to be ASCII instead, the simplest way is to use the
    C APIs instead, e.g.

      char buffer[3];

      snprintf (buffer, sizeof(buffer), "%02x", value);

      NSData *asAnNSData = [NSData dataWithBytes:buffer length:2];  //
    Two ASCII bytes, not NUL terminated
      NSData *withNUL = [NSData dataWithBytes:buffer length:3];    //
    Three bytes, last is a NUL

    though you can also use NSString, e.g.

      NSData *asciiData = [myString
    dataUsingEncoding:NSASCIIStringEncoding];

    > 2. The C-string measured above is meant to come from a url stored
    > in an NSString in a NSDictionary. To get the byte array I tried the
    > following...
    >
    > // string is http://www.apple.com
    > NSString *chapterURL = [resourceDict objectForKey:@"url"];
    > const char *cChapterURL = [chapterURL cStringUsingEncoding:
    > [NSString defaultCStringEncoding]];
    > [data3  appendBytes:&cChapterURL length:strlen(cChapterURL)];
    >
    > Result...
    > 50628816 00000005 00000000 a0110090 01000000
    >
    > Very much incorrect. What would the proper way to get a c-string
    > from a URL in an NSString?

    What do you want here?  Do you want the number of code points?
    (URLs, at least in future, will be able to contain arbitrary Unicode;
    I don't know how much support there is currently in Cocoa for that
    because I haven't checked, but there seems no reason to assume that
    there will be any unnecessary restrictions.)

    Or do you want the length assuming some particular string encoding
    (e.g. UTF-8)?

    If it's just the number of UTF-16 code units that you want, you can
    use NSString's -length method.  If you need something else, you might
    consider -lengthOfBytesUsingEncoding:, with an appropriate argument,
    though if you need the data as well then you could just use -
    dataUsingEncoding: and then the length of the NSData is the length
    you are after.

    > Overall, does anyone know of a tome that would be useful is
    > learning byte-level writing and data conversion at that level?

    Well, it depends exactly what it is that you want to learn.  A good
    place to start, for lower-level things like bytes, would be learning
    plain, vanilla C (not C++, Cocoa, C#, Java or anything like that, as
    tutorials in those languages tend either to assume that you know C,
    or try to skirt around the issue entirely).  It sounds like you might
    also want to take a good look at the whole area of character
    encoding, so that you understand why the results you show above are
    as they are... you might start by taking a look at ASCII: <http://
    en.wikipedia.org/wiki/ASCII>

    Finally, much of your confusion here seems to derive from not really
    understanding what you're trying to produce.  So perhaps you should
    ask whoever wrote the spec. you're trying to adhere to to explain
    what they meant?

    Kind regards,

    Alastair.

    --
    http://alastairs-place.net
  • On Oct 22, 2007, at 12:54 PM, Alastair Houghton wrote:

    > You're probably over-thinking it.  That's the usual reason for
    > problems like this.

    =)

    > It doesn't make sense for an eight-bit value to be Big Endian (or
    > Little Endian).

    Sorry, yes, I meant the overall file requires Big Endian values (.mov
    and .m4a)

    > Anyway, I'm unclear what you need here (and I think the reason is
    > that *you're* unclear what it is that you need).  Do you want a
    > binary value?  Or an ASCII string consisting of two hexadecimal
    > digits?

    > If you need binary, e.g. in an NSData, all you need do is:

    Ok, my problem here was something simple... I wasn't casting the
    [NSString length] to a int8_t. The following worked...

      int8_t chapterURLSizeValue = (int8_t)[chapterURL length];
      data3 = [NSMutableData dataWithCapacity:1];
      [data3  appendBytes:&chapterURLSizeValue length:1];

    > What do you want here?

    Here is the spec. The items I'm working on are URLLength and URL

    class TextHyperTextBox() extends TextSampleModifierBox (‘href’) {
      unsigned int(16) startcharoffset;
      unsigned int(16) endcharoffset;
      unsigned int(8) URLLength;
      unsigned int(8) URL[URLLength];
      unsigned int(8) altLength;
      unsigned int(8) altstring[altLength];
    }

    In a test file created by GarageBand, the binary output looks like so
    using xxd...

    02facd: 00085072 6f664361 73740000 00256872  ..ProfCast...%hr
    002fadd: 65660000 00081768 7474703a 2f2f7777  ef.....http://ww
    002faed: 772e7072 6f666361 73742e63 6f6d00    w.profcast.com.

    So, I need to convert an NSString value of something like "http:///
    www.apple.com", to a straight c-string and write that out. I've tried
    a lot of different ways, but I keep ending up with lots of binary
    0's. I think there is something fundamental about bytes/characters/
    etc. that I don't get.

    > Well, it depends exactly what it is that you want to learn.  A good
    > place to start, for lower-level things like bytes, would be
    > learning plain, vanilla C (not C++, Cocoa, C#, Java or anything
    > like that, as tutorials in those languages tend either to assume
    > that you know C, or try to skirt around the issue entirely).  It
    > sounds like you might also want to take a good look at the whole
    > area of character encoding, so that you understand why the results
    > you show above are as they are... you might start by taking a look
    > at ASCII: <http://en.wikipedia.org/wiki/ASCII>

    Excellent. Thank you.

    Jaime Magiera
    Sensory Research
    http://www.sensoryresearch.net
  • On 22 Oct 2007, at 18:13, Jaime Magiera wrote:

    >> What do you want here?
    >
    > Here is the spec. The items I'm working on are URLLength and URL
    >
    > class TextHyperTextBox() extends TextSampleModifierBox (‘href’) {
    > unsigned int(16) startcharoffset;
    > unsigned int(16) endcharoffset;
    > unsigned int(8) URLLength;
    > unsigned int(8) URL[URLLength];
    > unsigned int(8) altLength;
    > unsigned int(8) altstring[altLength];
    > }
    >
    > In a test file created by GarageBand, the binary output looks like
    > so using xxd...
    >
    > 002facd: 00085072 6f664361 73740000 00256872  ..ProfCast...%hr
    > 002fadd: 65660000 00081768 7474703a 2f2f7777  ef.....http://ww
    > 002faed: 772e7072 6f666361 73742e63 6f6d00    w.profcast.com.
    >
    > So, I need to convert an NSString value of something like "http:///
    > www.apple.com", to a straight c-string and write that out. I've
    > tried a lot of different ways, but I keep ending up with lots of
    > binary 0's. I think there is something fundamental about bytes/
    > characters/etc. that I don't get.

    Well the first thing you need to find out is what character encoding
    should be used for that string.  It should say somewhere in the spec
    what you should be using (or there will be a field somewhere that
    specifies the character encoding explicitly).

    At the risk of doing rather too much for you (and because questions
    about bytes and characters keep popping up so perhaps others will
    find this post and find what they need here)...

    I think the format you quote comes from the 3GPP "Timed text format"
    specifications, which starts by explaining the encoding in section 5
    (specifically, they say that you should use UTF-8 unless you start
    with a UTF-16 BOM, in which case you can use UTF-16; also, they say
    that text should be fully composed).

    It also (section 5.17.1.5) says specifically that the URL is in UTF-8.

    Assuming that you are building this structure up in an NSMutableData
    called "boxData", and that you have already encoded startCharOffset
    and endCharOffset, you might do something like:

      NSData *utf8Data = [[string precomposedStringWithCanonicalMapping]
                          dataUsingEncoding:NSUTF8StringEncoding];
      int len = [utf8Data length];

      NSAssert (len < 256, @"Chapter URL is too long (you might want to
    handle this differently)");

      uint8_t utf8URLLen = len;

      [boxData appendBytes:&utf8URLLen length:1];
      [boxData appendData:utf8Data];

    (You could equally use UTF8String and strlen(), though I'd prefer the
    -dataUsingEncoding: method for this particular application I think.)

    I'll also note that it's likely that you'll keep wanting to encode
    strings if you're working with a spec like this, and that they will
    often be in the same format (i.e. a length byte, followed by that
    many bytes of data).  You might consider adding a category on
    NSMutableData, containing the following method (or something similar):

      - (void)append3GPPString:(NSString *)string
      {
        NSData *utf8Data = [[string precomposedStringWithCanonicalMapping]
                            dataUsingEncoding:NSUTF8StringEncoding];
        int len = [utf8Data length];

        NSAssert (len < 256, @"String too long");

        uint8_t utf8Len = [utf8Data length];

        [self appendBytes:&utf8Len length:1];
        [self appendData:utf8Data];
      }

    then you could just write

      [boxData append3GPPString:chapterURL];

    every time you want to encode a <length, string> pair like that.

    Hopefully you should be able to compare that with what you have now
    and see where you were going wrong.  (You might also consider
    improving the error handling in the above code; you probably don't
    want it to just assert if the string is too long to fit... maybe
    truncating it somehow [careful; it's UTF-8 encoded <http://
    en.wikipedia.org/wiki/UTF-8>] or reporting an error to the user is
    what you want.)

    Kind regards,

    Alastair.

    --
    http://alastairs-place.net
previous month october 2007 next month
MTWTFSS
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31        
Go to today