How to code a NSString literal with UTF8?
-
The Xcode editor lets you define a file encoding as UTF8. You can
then type in a line like:
NSString *foo = @"blah blah";
where blah blah have Unicode characters in them with encoded as UTF8.
But, when I ask this string for its characters I get garbage back,
and length is not correct.
What would the best way to do this be?
david -
In such cases, I will code like this...
NSString *foo = NSLocalizedString( @"blah blah", @"some comment" );
And then define the actual Unicode characters in the file
"Localizable.strings."
Satoshi
on 05.3.30 0:17 PM, David Hoerl at <dhoerl...> wrote:
> The Xcode editor lets you define a file encoding as UTF8. You can-----------------------------------------------------
> then type in a line like:
>
> NSString *foo = @"blah blah";
>
> where blah blah have Unicode characters in them with encoded as UTF8.
>
> But, when I ask this string for its characters I get garbage back,
> and length is not correct.
>
> What would the best way to do this be?
>
> david
Satoshi Matsumoto <satoshi...>
816-5 Odake, Odawara, Kanagawa, Japan 256-0802 -
> The Xcode editor lets you define a file encoding as UTF8. You can
> then type in a line like:
>
> NSString *foo = @"blah blah";
>
> where blah blah have Unicode characters in them with encoded as UTF8.
>
> But, when I ask this string for its characters I get garbage back,
> and length is not correct.
>
> What would the best way to do this be?
This has been discussed many times on the list and even recently. You
could check the archives. The bottom line is that the literal strings
which are defined with "@" cannot be relied upon for any non-ASCII
encoding. It has nothing to do with the encoding of the file; it is
dependent upon the encodings that NSConstantString supports, which
number very few (I think only one, in fact). You'll either have to use
localized strings or a specific encoding for the string in which you
are interested. You may want to look at:
http://developer.apple.com/documentation/MacOSX/Conceptual/BPInternational/
Tasks/GettingStrings.html
I hope this helps,
Will -
On 30 Mar 2005, at 04:17, David Hoerl wrote:
> The Xcode editor lets you define a file encoding as UTF8. You can then
> type in a line like:
>
> NSString *foo = @"blah blah";
>
> where blah blah have Unicode characters in them with encoded as UTF8.
> But, when I ask this string for its characters I get garbage back, and
> length is not correct.
>
> What would the best way to do this be?
If you really want to hard-wire a Unicode string into your application
(e.g for odd glyphs like smilies or Apple command key characters) you
can use:
[NSString stringWithUTF8String: "blah blah"]; // Note that that is a C
"" string, not an @""
If however you want to include international text in your application
you are probably better off using:
NSLocalizedString(@"key string", @"default" );
Then you can put the unicode text into Localizable.strings under the
"key string" key and have different values for each language.
Cheers,
Nicko -
One possible misconception people come away in these discussions is
that you need to use NSLocalizedString() if you want NSStrings with
non-ASCII characters.
NSLocalizedString and friends are absolutely great choice if you need
localizable strings, that is, strings which will need to be
translated to different languages. The string is read from
a .strings file, which can be made per-language.
However, if all you want is an NSString with some non-ASCII chars in
it, and it's not meant to be shown to the user (and hence doesn't
need to be localized), then it's perfectly fine to create the string
programmatically. One possibility here is:
NSString *s = [NSString stringWithFormat:@"Long %C dash", 0x2014];
You can also do (since \xe2\x80\x94 is the 3-byte UTF-8 string for
0x2014):
NSString *s = [NSString stringWithUTF8String:"Long \xe2\x80\x94
dash"];
but one thing that is not very safe is to include actual high-bit
characters in your source code:
NSString *s = [NSString stringWithUTF8String:"Long — dash"]; //
Not safe; you're at the mercy of any tools you use
and the following is not allowed:
NSString *s = @"Long — dash"; // Not allowed
Ali
Begin forwarded message:
> From: Satoshi Matsumoto <satoshi...>
> Date: March 29, 2005 20:06:23 PST
> To: David Hoerl <dhoerl...>, Cocoa-Dev <cocoa-
> <dev...>
> Subject: Re: How to code a NSString literal with UTF8?
>
> In such cases, I will code like this...
>
> NSString *foo = NSLocalizedString( @"blah blah", @"some comment" );
>
> And then define the actual Unicode characters in the file
> "Localizable.strings."
>
> Satoshi
>
> on 05.3.30 0:17 PM, David Hoerl at <dhoerl...> wrote:
>
>> The Xcode editor lets you define a file encoding as UTF8. You can
>> then type in a line like:
>>
>> NSString *foo = @"blah blah";
>>
>> where blah blah have Unicode characters in them with encoded as UTF8.
>>
>> But, when I ask this string for its characters I get garbage back,
>> and length is not correct.
>>
>> What would the best way to do this be?
>>
>> david
>>
-
Thank you Ali!
Now, could we *please* have this added to the official documentation
(Scott + Mmalc, are you listening? Somewhere in NSString, and perhaps
to the ObjC docs?), perhaps to some example code snippet included with
the dev tools and finally to all third party Cocoa resource web sites
on the web (Mr. Alastair, Mr. Stevenson...)?
If this question is not a FAQ item, I don't know what is...
People will no doubt still ask this question, but it would be great to
have an authoritative answer to point to! We can of course always use
this:
<http://www.cocoabuilder.com/archive/message/cocoa/2005/3/30/131791>
:-)
j o a r
On 2005-03-30, at 18.59, Ali Ozer wrote:
> One possible misconception people come away in these discussions is
> that you need to use NSLocalizedString() if you want NSStrings with
> non-ASCII characters.
>
> NSLocalizedString and friends are absolutely great choice if you need
> localizable strings, that is, strings which will need to be translated
> to different languages. The string is read from a .strings file,
> which can be made per-language.
>
> However, if all you want is an NSString with some non-ASCII chars in
> it, and it's not meant to be shown to the user (and hence doesn't need
> to be localized), then it's perfectly fine to create the string
> programmatically. One possibility here is:
>
> NSString *s = [NSString stringWithFormat:@"Long %C dash", 0x2014];
>
> You can also do (since \xe2\x80\x94 is the 3-byte UTF-8 string for
> 0x2014):
>
> NSString *s = [NSString stringWithUTF8String:"Long \xe2\x80\x94
> dash"];
>
> but one thing that is not very safe is to include actual high-bit
> characters in your source code:
>
> NSString *s = [NSString stringWithUTF8String:"Long — dash"]; //
> Not safe; you're at the mercy of any tools you use
>
> and the following is not allowed:
>
> NSString *s = @"Long — dash"; // Not allowed
-
On Mar 30, 2005, at 9:20 AM, j o a r wrote:
> Now, could we *please* have this added to the officialAs ever, the official answer to this would be, "Please file a bug
> documentation (Scott + Mmalc, are you listening? Somewhere in
> NSString, and perhaps to the ObjC docs?),
>
report" :-)
In this case, however, it would get returned as a duplicate...
mmalc -
On 2005-03-30 08:59, Ali Ozer said:
> NSString *s = [NSString stringWithUTF8String:"Long -- dash"]; //
> Not safe; you're at the mercy of any tools you use
As I understand it, the danger here is that the compiler has no way of
knowing the file's text encoding, right? The compiler will have to
assume ASCII, MacRoman, UTF8, or whatever, and if it assumes wrong,
badness ensues. (In the XCode IDE, a user can specify a file's encoding,
which helps.) And since ASCII is the common denominator, only 7bit
characters are 'safe'. Correct?
> and the following is not allowed:
>
> NSString *s = @"Long -- dash"; // Not allowed
"Not allowed" but still accepted by gcc 3.3 (and CW 9.4). <rdar://4073313>
This I understand to be disallowed because it is so documented:
"@"string" - Defines a constant NSString object in the current module and
initializes the object with the specified 7-bit ASCII-encoded string."
Fair enough.
But am I the only one who thinks it should be allowed? Apple could
change the Obj-C language to allow it. I doubt much code would break.
Do people really depend on such strings being NSConstantStrings? We'd
still have the problem of the compiler having to know/guess the file's
encoding, but that problem is always there.
--
____________________________________________________________
Sean McBride, B. Eng <sean...>
Rogue Research www.rogue-research.com
Mac Software Developer Montréal, Québec, Canada -
>> NSString *s = [NSString stringWithUTF8String:"Long -- dash"]; //
>> Not safe; you're at the mercy of any tools you use
>>
>
> As I understand it, the danger here is that the compiler has no way of
> knowing the file's text encoding, right? The compiler will have to
> assume ASCII, MacRoman, UTF8, or whatever, and if it assumes wrong,
> badness ensues. (In the XCode IDE, a user can specify a file's
> encoding,
> which helps.) And since ASCII is the common denominator, only 7bit
> characters are 'safe'. Correct?
Correct. I think in the world of Xcode you'd be fine, since Xcode now
has per-file-encodings and such; but if you ever used another editor,
things might become questionable.
>> and the following is not allowed:
>>
>> NSString *s = @"Long -- dash"; // Not allowed
>>
>
> "Not allowed" but still accepted by gcc 3.3 (and CW 9.4). <rdar://
> 4073313>
>
> This I understand to be disallowed because it is so documented:
> "@"string" - Defines a constant NSString object in the current
> module and
> initializes the object with the specified 7-bit ASCII-encoded string."
> Fair enough.
>
> But am I the only one who thinks it should be allowed? Apple could
> change the Obj-C language to allow it. I doubt much code would break.
> Do people really depend on such strings being NSConstantStrings? We'd
> still have the problem of the compiler having to know/guess the file's
> encoding, but that problem is always there.
You're not the only one who thinks this should be allowed and it is
something that is on several investigation/future-feature lists.
Ali -
On 30 Mar 2005, at 13:01, mmalcolm crawford wrote:
> As ever, the official answer to this would be, "Please file a bugThis response seems to be a mixed message and one which contravenes
> report" :-)
> In this case, however, it would get returned as a duplicate...
>
standard responses from Apple engineers. We are told constantly to
"file a bug report" and "it's okay if it's a duplicate because we use
duplicates when prioritizing a reported bug." But you seem to be saying
here: don't bother it's a duplicate.
Dave
--
Chaos Werks Design
"Suburbia is where the developer bulldozes out the trees, then names
the streets after them."
- Bill Vaughn -
On 2005-03-30, at 18.59, Ali Ozer wrote:
> One possible misconception people come away in these discussions is
> that you need to use NSLocalizedString() if you want NSStrings with
> non-ASCII characters.
I don't think it is misconception to use NSLocalizedString() for defining
none-ASCII characters in NSString.
As the encoding of "@"string" is 7-bit ASCII, if you want to define
none-ASCII characters, you do need to "localize".
> programmatically. One possibility here is:
> NSString *s = [NSString stringWithFormat:@"Long %C dash", 0x2014];
Please imagine that Japanese developers define all Japanese strings like
above.
If I don't need to localize to other languages, I always use
NSLocalizedString() to define Japanese strings.
Satoshi
-----------------------------------------------------
Satoshi Matsumoto <satoshi...>
816-5 Odake, Odawara, Kanagawa, Japan 256-0802 -
On 30 Mar 2005, at 19:18, Sean McBride wrote:
> On 2005-03-30 08:59, Ali Ozer said:
>
>> NSString *s = [NSString stringWithUTF8String:"Long -- dash"]; //
>> Not safe; you're at the mercy of any tools you use
>
> As I understand it, the danger here is that the compiler has no way of
> knowing the file's text encoding, right?
Yes, the compiler has to make an assumption about the file encoding.
Of course the compiler already makes an assumption about the coding, as
you will rapidly discover when you try saving your C source in UTF-16
encoding, let alone EBCDIC.
The C99 spec explicitly allows top-bit-set bytes in source code but the
interpretation of them is essentially up to the application to
determine in the context of the locale. All the characters used by the
C language itself are taken from set that codes the same in ASCII and
UTF-8 and no continuation byte of a UTF-8 coding can be the same as any
ASCII character. Personally I think that it is entirely reasonable to
rely on this aspect of the C specification when entering Unicode
characters which are going to be interpreted using
stringWithUTF8String:; it is defined to work with the default encoding
for saving source and if the user changes the coding to anything
non-standard it could stop working for a whole stack of reasons.
>> and the following is not allowed:
>>
>> NSString *s = @"Long -- dash"; // Not allowed
>
> "Not allowed" but still accepted by gcc 3.3 (and CW 9.4).
> <rdar://4073313>
It's accepted but in most cases results in the wrong string.
> This I understand to be disallowed because it is so documented:
> "@"string" - Defines a constant NSString object in the current module
> and
> initializes the object with the specified 7-bit ASCII-encoded string."
> Fair enough.
>
> But am I the only one who thinks it should be allowed?
No, I think it would be a fine thing too. That said, one problem I
can envisage is that since top-bit-set bytes inside @"..." constants
are passes silently by the compiler at the moment there would be a
danger that confusion would arise from code using this paradigm
compiling cleanly on both old and new compilers while generating very
different results.
Nicko


