FROM : Ricky Sharp
DATE : Tue Aug 29 17:23:07 2006
On Tuesday, August 29, 2006, at 06:57AM, Chris Suter <<email_removed>> wrote:
>
>On 29/08/2006, at 9:42 PM, Ricky Sharp wrote:
>
>>
>> On Tuesday, August 29, 2006, at 00:59AM, Chris Suter
>> <<email_removed>> wrote:
>>
>>>
>>> On 29/08/2006, at 3:47 PM, Donald Hall wrote:
>>>
>>>> Furthermore, I understood that "external representation" was always
>>>> big endian.
>>>
>>> No. The representation is as dictated by the encoding. Some encodings
>>> don't have an endian aspect to them (UTF-8 for example). I'm guessing
>>> if you pick kCFStringEncodingUTF16, OS X is free to choose big-endian
>>> or little-endian.
>>
>> Not quite. According to <http://www.unicode.org/faq/
>> utf_bom.html#36>, unmarked UTF-16 and UTF-32 uses big-endian by
>> default. I would expect the Cocoa frameworks to honor that default.
>
>But Cocoa can write the byte order mark, and as it turns out, I'm right.
>
>#include <CoreFoundation/CoreFoundation.h>
>#include <stdio.h>
>
>int main ()
>{
> CFDataRef data;
>
> data = CFStringCreateExternalRepresentation (NULL, CFSTR ("test"),
>kCFStringEncodingUTF16, 0);
>
> int i;
>
> for (i = 0; i < CFDataGetLength (data); ++i)
> printf ("%02x ", CFDataGetBytePtr (data)[i]);
>
> putchar ('\n');
>
> return 0;
>}
>
>produces:
>
>ff fe 74 00 65 00 73 00 74 00
>
>on an Intel machine.
Here, the actual encoding being used is UTF-16LE. By placing the BOM in there, receivers of the data know how to interpret it. But in cases where 16-bit data is unmarked, you should interpret the bytes as being big-endian by default.
--
Rick Sharp
Instant Interactive(tm)
DATE : Tue Aug 29 17:23:07 2006
On Tuesday, August 29, 2006, at 06:57AM, Chris Suter <<email_removed>> wrote:
>
>On 29/08/2006, at 9:42 PM, Ricky Sharp wrote:
>
>>
>> On Tuesday, August 29, 2006, at 00:59AM, Chris Suter
>> <<email_removed>> wrote:
>>
>>>
>>> On 29/08/2006, at 3:47 PM, Donald Hall wrote:
>>>
>>>> Furthermore, I understood that "external representation" was always
>>>> big endian.
>>>
>>> No. The representation is as dictated by the encoding. Some encodings
>>> don't have an endian aspect to them (UTF-8 for example). I'm guessing
>>> if you pick kCFStringEncodingUTF16, OS X is free to choose big-endian
>>> or little-endian.
>>
>> Not quite. According to <http://www.unicode.org/faq/
>> utf_bom.html#36>, unmarked UTF-16 and UTF-32 uses big-endian by
>> default. I would expect the Cocoa frameworks to honor that default.
>
>But Cocoa can write the byte order mark, and as it turns out, I'm right.
>
>#include <CoreFoundation/CoreFoundation.h>
>#include <stdio.h>
>
>int main ()
>{
> CFDataRef data;
>
> data = CFStringCreateExternalRepresentation (NULL, CFSTR ("test"),
>kCFStringEncodingUTF16, 0);
>
> int i;
>
> for (i = 0; i < CFDataGetLength (data); ++i)
> printf ("%02x ", CFDataGetBytePtr (data)[i]);
>
> putchar ('\n');
>
> return 0;
>}
>
>produces:
>
>ff fe 74 00 65 00 73 00 74 00
>
>on an Intel machine.
Here, the actual encoding being used is UTF-16LE. By placing the BOM in there, receivers of the data know how to interpret it. But in cases where 16-bit data is unmarked, you should interpret the bytes as being big-endian by default.
--
Rick Sharp
Instant Interactive(tm)
| Related mails | Author | Date |
|---|---|---|
| Donald Hall | Aug 29, 07:47 | |
| Chris Suter | Aug 29, 07:58 | |
| Ricky Sharp | Aug 29, 13:42 | |
| Chris Suter | Aug 29, 13:56 | |
| Ricky Sharp | Aug 29, 17:23 | |
| Donald Hall | Aug 30, 07:14 | |
| Chris Suter | Aug 30, 07:58 |






Cocoa mail archive

