How to code a NSString literal with UTF8?

  • The Xcode editor lets you define a file encoding as UTF8. You can
    then type in a line like:

    NSString *foo = @"blah blah";

    where blah blah have Unicode characters in them with encoded as UTF8.

    But, when I ask this string for its characters I get garbage back,
    and length is not correct.

    What would the best way to do this be?

    david
  • In such cases, I will code like this...

    NSString *foo = NSLocalizedString( @"blah blah", @"some comment" );

    And then define the actual Unicode characters in the file
    "Localizable.strings."

    Satoshi

    on 05.3.30 0:17 PM, David Hoerl at <dhoerl...> wrote:
    > The Xcode editor lets you define a file encoding as UTF8. You can
    > then type in a line like:
    >
    > NSString *foo = @"blah blah";
    >
    > where blah blah have Unicode characters in them with encoded as UTF8.
    >
    > But, when I ask this string for its characters I get garbage back,
    > and length is not correct.
    >
    > What would the best way to do this be?
    >
    > david
    -----------------------------------------------------
    Satoshi Matsumoto <satoshi...>
    816-5 Odake, Odawara, Kanagawa, Japan 256-0802
  • > The Xcode editor lets you define a file encoding as UTF8. You can
    > then type in a line like:
    >
    > NSString *foo = @"blah blah";
    >
    > where blah blah have Unicode characters in them with encoded as UTF8.
    >
    > But, when I ask this string for its characters I get garbage back,
    > and length is not correct.
    >
    > What would the best way to do this be?

    This has been discussed many times on the list and even recently. You
    could check the archives. The bottom line is that the literal strings
    which are defined with "@" cannot be relied upon for any non-ASCII
    encoding. It has nothing to do with the encoding of the file; it is
    dependent upon the encodings that NSConstantString supports, which
    number very few (I think only one, in fact). You'll either have to use
    localized strings or a specific encoding for the string in which you
    are interested. You may want to look at:

    http://developer.apple.com/documentation/MacOSX/Conceptual/BPInternational/
    Tasks/GettingStrings.html


    I hope this helps,
    Will
  • On 30 Mar 2005, at 04:17, David Hoerl wrote:

    > The Xcode editor lets you define a file encoding as UTF8. You can then
    > type in a line like:
    >
    > NSString *foo = @"blah blah";
    >
    > where blah blah have Unicode characters in them with encoded as UTF8.
    > But, when I ask this string for its characters I get garbage back, and
    > length is not correct.
    >
    > What would the best way to do this be?

    If you really want to hard-wire a Unicode string into your application
    (e.g for odd glyphs like smilies or Apple command key characters) you
    can use:
    [NSString stringWithUTF8String: "blah blah"]; // Note that that is a C
    "" string, not an @""
    If however you want to include international text in your application
    you are probably better off using:
    NSLocalizedString(@"key string", @"default" );
    Then you can put the unicode text into Localizable.strings under the
    "key string" key and have different values for each language.

    Cheers,
      Nicko
  • One possible misconception people come away in these discussions is
    that you need to use NSLocalizedString() if you want NSStrings with
    non-ASCII characters.

    NSLocalizedString and friends are absolutely great choice if you need
    localizable strings, that is, strings which will need to be
    translated to different languages.  The string is read from
    a .strings file, which can be made per-language.

    However, if all you want is an NSString with some non-ASCII chars in
    it, and it's not meant to be shown to the user (and hence doesn't
    need to be localized), then it's perfectly fine to create the string
    programmatically. One possibility here is:

      NSString *s = [NSString stringWithFormat:@"Long %C dash", 0x2014];

    You can also do (since \xe2\x80\x94 is the 3-byte UTF-8 string for
    0x2014):

      NSString *s = [NSString stringWithUTF8String:"Long \xe2\x80\x94
    dash"];

    but one thing that is not very safe is to include actual high-bit
    characters in your source code:

      NSString *s = [NSString stringWithUTF8String:"Long — dash"];    //
    Not safe; you're at the mercy of any tools you use

    and the following is not allowed:

      NSString *s = @"Long — dash";    // Not allowed

    Ali

    Begin forwarded message:

    > From: Satoshi Matsumoto <satoshi...>
    > Date: March 29, 2005 20:06:23 PST
    > To: David Hoerl <dhoerl...>, Cocoa-Dev <cocoa-
    > <dev...>
    > Subject: Re: How to code a NSString literal with UTF8?
    >
    > In such cases, I will code like this...
    >
    > NSString *foo = NSLocalizedString( @"blah blah", @"some comment" );
    >
    > And then define the actual Unicode characters in the file
    > "Localizable.strings."
    >
    > Satoshi
    >
    > on 05.3.30 0:17 PM, David Hoerl at <dhoerl...> wrote:
    >
    >> The Xcode editor lets you define a file encoding as UTF8. You can
    >> then type in a line like:
    >>
    >> NSString *foo = @"blah blah";
    >>
    >> where blah blah have Unicode characters in them with encoded as UTF8.
    >>
    >> But, when I ask this string for its characters I get garbage back,
    >> and length is not correct.
    >>
    >> What would the best way to do this be?
    >>
    >> david
    >>
  • Thank you Ali!

    Now, could we *please* have this added to the official documentation
    (Scott + Mmalc, are you listening? Somewhere in NSString, and perhaps
    to the ObjC docs?), perhaps to some example code snippet included with
    the dev tools and finally to all third party Cocoa resource web sites
    on the web (Mr. Alastair, Mr. Stevenson...)?

    If this question is not a FAQ item, I don't know what is...
    People will no doubt still ask this question, but it would be great to
    have an authoritative answer to point to! We can of course always use
    this:

    <http://www.cocoabuilder.com/archive/message/cocoa/2005/3/30/131791>

    :-)

    j o a r

    On 2005-03-30, at 18.59, Ali Ozer wrote:

    > One possible misconception people come away in these discussions is
    > that you need to use NSLocalizedString() if you want NSStrings with
    > non-ASCII characters.
    >
    > NSLocalizedString and friends are absolutely great choice if you need
    > localizable strings, that is, strings which will need to be translated
    > to different languages.  The string is read from a .strings file,
    > which can be made per-language.
    >
    > However, if all you want is an NSString with some non-ASCII chars in
    > it, and it's not meant to be shown to the user (and hence doesn't need
    > to be localized), then it's perfectly fine to create the string
    > programmatically. One possibility here is:
    >
    > NSString *s = [NSString stringWithFormat:@"Long %C dash", 0x2014];
    >
    > You can also do (since \xe2\x80\x94 is the 3-byte UTF-8 string for
    > 0x2014):
    >
    > NSString *s = [NSString stringWithUTF8String:"Long \xe2\x80\x94
    > dash"];
    >
    > but one thing that is not very safe is to include actual high-bit
    > characters in your source code:
    >
    > NSString *s = [NSString stringWithUTF8String:"Long — dash"];    //
    > Not safe; you're at the mercy of any tools you use
    >
    > and the following is not allowed:
    >
    > NSString *s = @"Long — dash";    // Not allowed
  • On Mar 30, 2005, at 9:20 AM, j o a r wrote:

    > Now, could we *please* have this added to the official
    > documentation (Scott + Mmalc, are you listening? Somewhere in
    > NSString, and perhaps to the ObjC docs?),
    >
    As ever, the official answer to this would be, "Please file a bug
    report"  :-)
    In this case, however, it would get returned as a duplicate...

    mmalc
  • On 2005-03-30 08:59, Ali Ozer said:

    > NSString *s = [NSString stringWithUTF8String:"Long -- dash"];    //
    > Not safe; you're at the mercy of any tools you use

    As I understand it, the danger here is that the compiler has no way of
    knowing the file's text encoding, right?  The compiler will have to
    assume ASCII, MacRoman, UTF8, or whatever, and if it assumes wrong,
    badness ensues.  (In the XCode IDE, a user can specify a file's encoding,
    which helps.) And since ASCII is the common denominator, only 7bit
    characters are 'safe'.  Correct?

    > and the following is not allowed:
    >
    > NSString *s = @"Long -- dash";    // Not allowed

    "Not allowed" but still accepted by gcc 3.3 (and CW 9.4).  <rdar://4073313>

    This I understand to be disallowed because it is so documented:
    "@"string" - Defines a constant NSString object in the current module and
    initializes the object with the specified 7-bit ASCII-encoded string."
    Fair enough.

    But am I the only one who thinks it should be allowed?  Apple could
    change the Obj-C language to allow it.  I doubt much code would break.
    Do people really depend on such strings being NSConstantStrings?  We'd
    still have the problem of the compiler having to know/guess the file's
    encoding, but that problem is always there.

    --
    ____________________________________________________________
    Sean McBride, B. Eng                <sean...>
    Rogue Research                        www.rogue-research.com
    Mac Software Developer              Montréal, Québec, Canada
  • >> NSString *s = [NSString stringWithUTF8String:"Long -- dash"];    //
    >> Not safe; you're at the mercy of any tools you use
    >>
    >
    > As I understand it, the danger here is that the compiler has no way of
    > knowing the file's text encoding, right?  The compiler will have to
    > assume ASCII, MacRoman, UTF8, or whatever, and if it assumes wrong,
    > badness ensues.  (In the XCode IDE, a user can specify a file's
    > encoding,
    > which helps.) And since ASCII is the common denominator, only 7bit
    > characters are 'safe'.  Correct?

    Correct. I think in the world of Xcode you'd be fine, since Xcode now
    has per-file-encodings and such; but if you ever used another editor,
    things might become questionable.

    >> and the following is not allowed:
    >>
    >> NSString *s = @"Long -- dash";    // Not allowed
    >>
    >
    > "Not allowed" but still accepted by gcc 3.3 (and CW 9.4).  <rdar://
    > 4073313>
    >
    > This I understand to be disallowed because it is so documented:
    > "@"string" - Defines a constant NSString object in the current
    > module and
    > initializes the object with the specified 7-bit ASCII-encoded string."
    > Fair enough.
    >
    > But am I the only one who thinks it should be allowed?  Apple could
    > change the Obj-C language to allow it.  I doubt much code would break.
    > Do people really depend on such strings being NSConstantStrings?  We'd
    > still have the problem of the compiler having to know/guess the file's
    > encoding, but that problem is always there.

    You're not the only one who thinks this should be allowed and it is
    something that is on several investigation/future-feature lists.

    Ali
  • On 30 Mar 2005, at 13:01, mmalcolm crawford wrote:

    > As ever, the official answer to this would be, "Please file a bug
    > report"  :-)
    > In this case, however, it would get returned as a duplicate...
    >
    This response seems to be a mixed message and one which contravenes
    standard responses from Apple engineers. We are told constantly to
    "file a bug report" and "it's okay if it's a duplicate because we use
    duplicates when prioritizing a reported bug." But you seem to be saying
    here: don't bother it's a duplicate.

    Dave
    --
    Chaos Werks Design
    "Suburbia is where the developer bulldozes out the trees, then names
    the streets after them."
    - Bill Vaughn
  • On 2005-03-30, at 18.59, Ali Ozer wrote:
    > One possible misconception people come away in these discussions is
    > that you need to use NSLocalizedString() if you want NSStrings with
    > non-ASCII characters.

    I don't think it is misconception to use NSLocalizedString() for defining
    none-ASCII characters in NSString.

    As the encoding of "@"string" is 7-bit ASCII, if you want to define
    none-ASCII characters, you do need to "localize".

    > programmatically. One possibility here is:
    > NSString *s = [NSString stringWithFormat:@"Long %C dash", 0x2014];

    Please imagine that Japanese developers define all Japanese strings like
    above.

    If I don't need to localize to other languages, I always use
    NSLocalizedString() to define Japanese strings.

    Satoshi
    -----------------------------------------------------
    Satoshi Matsumoto <satoshi...>
    816-5 Odake, Odawara, Kanagawa, Japan 256-0802
  • On 30 Mar 2005, at 19:18, Sean McBride wrote:
    > On 2005-03-30 08:59, Ali Ozer said:
    >
    >> NSString *s = [NSString stringWithUTF8String:"Long -- dash"];    //
    >> Not safe; you're at the mercy of any tools you use
    >
    > As I understand it, the danger here is that the compiler has no way of
    > knowing the file's text encoding, right?

    Yes, the compiler has to make an assumption about the file encoding.
    Of course the compiler already makes an assumption about the coding, as
    you will rapidly discover when you try saving your C source in UTF-16
    encoding, let alone EBCDIC.

    The C99 spec explicitly allows top-bit-set bytes in source code but the
    interpretation of them is essentially up to the application to
    determine in the context of the locale.  All the characters used by the
    C language itself are taken from set that codes the same in ASCII and
    UTF-8 and no continuation byte of a UTF-8 coding can be the same as any
    ASCII character.  Personally I think that it is entirely reasonable to
    rely on this aspect of the C specification when entering Unicode
    characters which are going to be interpreted using
    stringWithUTF8String:; it is defined to work with the default encoding
    for saving source and if the user changes the coding to anything
    non-standard it could stop working for a whole stack of reasons.

    >> and the following is not allowed:
    >>
    >> NSString *s = @"Long -- dash";    // Not allowed
    >
    > "Not allowed" but still accepted by gcc 3.3 (and CW 9.4).
    > <rdar://4073313>

    It's accepted but in most cases results in the wrong string.

    > This I understand to be disallowed because it is so documented:
    > "@"string" - Defines a constant NSString object in the current module
    > and
    > initializes the object with the specified 7-bit ASCII-encoded string."
    > Fair enough.
    >
    > But am I the only one who thinks it should be allowed?

    No, I think it would be a fine thing too.  That said, one problem I
    can envisage is that since top-bit-set bytes inside @"..." constants
    are passes silently by the compiler at the moment there would be a
    danger that confusion would arise from code using this paradigm
    compiling cleanly on both old and new compilers while generating very
    different results.

    Nicko
previous month march 2005 next month
MTWTFSS
  1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31      
Go to today