FROM : Dave Camp
DATE : Tue Mar 04 17:25:07 2008
On Mar 3, 2008, at 11:00 PM, Stuart Malin wrote:
> My problem is that I receive a function call from a C library that
> gives me a wchar_t array and its length. The unicode array is _not_
> terminated.
>
> The library defines an XML_Char type, so my code below refers to
> that, but XML_Char is wchar_t (which, I believe is UTF8 on a Mac).
You actually have two problems here:
1) wchar_t on the Mac is a 4 byte per character container (32 bits).
2) wchar_t is just a container, it does not define the encoding of the
character it contains.
So, you need to know exactly what the encoding used in your container
is before you can get it converted to an NSString of a known encoding.
NSString infers the width of the characters from the encoding. If you
have a buffer of characters where the width does not match the
encoding you will probably have to re-buffer the characters into the
correct width before handing them to NSString.
If you are correct in that you have a piece of code that has UTF8 in a
wchar_t string (which would be horribly inefficient, wasting 3 bytes
per character in the string) you might need to write some code that
copies every 4th byte from the wchar_t string into a UTF8 buffer that
you can then use as input to NSString.
Dave
DATE : Tue Mar 04 17:25:07 2008
On Mar 3, 2008, at 11:00 PM, Stuart Malin wrote:
> My problem is that I receive a function call from a C library that
> gives me a wchar_t array and its length. The unicode array is _not_
> terminated.
>
> The library defines an XML_Char type, so my code below refers to
> that, but XML_Char is wchar_t (which, I believe is UTF8 on a Mac).
You actually have two problems here:
1) wchar_t on the Mac is a 4 byte per character container (32 bits).
2) wchar_t is just a container, it does not define the encoding of the
character it contains.
So, you need to know exactly what the encoding used in your container
is before you can get it converted to an NSString of a known encoding.
NSString infers the width of the characters from the encoding. If you
have a buffer of characters where the width does not match the
encoding you will probably have to re-buffer the characters into the
correct width before handing them to NSString.
If you are correct in that you have a piece of code that has UTF8 in a
wchar_t string (which would be horribly inefficient, wasting 3 bytes
per character in the string) you might need to write some code that
copies every 4th byte from the wchar_t string into a UTF8 buffer that
you can then use as input to NSString.
Dave






Cocoa mail archive

