FROM : Aki Inoue
DATE : Tue May 06 19:45:39 2008
On 2008/05/06, at 8:56, Jens Alfke wrote:
>
> On 6 May '08, at 7:03 AM, Thomas Engelmeier wrote:
>
>> As the OP wants to create NSStrings with data created by his
>> application I'm pretty sure he will not want the the Windows
>> encoding - unless he parses text documents originating from Windows.
>
> He didn't say where the data originates from, or what those APIs are
> that return the strings. If they're networking APIs, the data could
> very likely have originated on Windows.
>
> Also, you missed my point about using CP1252 (WinLatin1). It's
> useful as a fallback for any unknown C strings because (a) it's a
> superset of ISO-Latin-1, which (b) has no gaps in it (as ISO does,
> from 0x80-0x9F), so decoding text into an NSString will never fail
> and return nil. (I've debugged several crashes that stemmed from nil
> NSStrings decoded from garbage strings.)
Jens,
Actually, I don't recommend using CP1252 as the generic fallback
encoding like this.
The encoding does have gaps, and the handling of those invalid gaps
varies between conversion engines. CF/NSString treat the invalid
bytes strictly and return nil encountering those.
Also, being compatible with ISO Latin1 (aka ISO 8859-1) is becoming
less compelling reasons in the Net since the overall percentage of the
encoding (both ISO 8859-1 and cp1252 combined) is declining.
>> If the bytes come from MacOS text files he may want to use the
>> MacRoman encoding, otherwise creating UTF8 and passing around
>> NSStrings will be the way to go - especially in Europe where all
>> that äöüñá goodies exist.
>
> For the most part only old (pre-OS X) files would still be using
> MacRoman. Current Mac apps generally default to UTF-8.
So, our recommendation now is to try UTF-8 first; then, try some other
encoding deduced from the context (user's localization, intended
source/destination of the data, etc). If all failed, should try
MacRoman as the ultimate fallback (the encoding has no gap so never
fails).
Aki
>
>
> —Jens_______________________________________________
>
> Cocoa-dev mailing list (<email_removed>)
>
> Please do not post admin requests or moderator comments to the list.
> Contact the moderators at cocoa-dev-admins(at)lists.apple.com
>
> Help/Unsubscribe/Update your Subscription:
> http://lists.apple.com/mailman/options/cocoa-dev/<email_removed>
>
> This email sent to <email_removed>
DATE : Tue May 06 19:45:39 2008
On 2008/05/06, at 8:56, Jens Alfke wrote:
>
> On 6 May '08, at 7:03 AM, Thomas Engelmeier wrote:
>
>> As the OP wants to create NSStrings with data created by his
>> application I'm pretty sure he will not want the the Windows
>> encoding - unless he parses text documents originating from Windows.
>
> He didn't say where the data originates from, or what those APIs are
> that return the strings. If they're networking APIs, the data could
> very likely have originated on Windows.
>
> Also, you missed my point about using CP1252 (WinLatin1). It's
> useful as a fallback for any unknown C strings because (a) it's a
> superset of ISO-Latin-1, which (b) has no gaps in it (as ISO does,
> from 0x80-0x9F), so decoding text into an NSString will never fail
> and return nil. (I've debugged several crashes that stemmed from nil
> NSStrings decoded from garbage strings.)
Jens,
Actually, I don't recommend using CP1252 as the generic fallback
encoding like this.
The encoding does have gaps, and the handling of those invalid gaps
varies between conversion engines. CF/NSString treat the invalid
bytes strictly and return nil encountering those.
Also, being compatible with ISO Latin1 (aka ISO 8859-1) is becoming
less compelling reasons in the Net since the overall percentage of the
encoding (both ISO 8859-1 and cp1252 combined) is declining.
>> If the bytes come from MacOS text files he may want to use the
>> MacRoman encoding, otherwise creating UTF8 and passing around
>> NSStrings will be the way to go - especially in Europe where all
>> that äöüñá goodies exist.
>
> For the most part only old (pre-OS X) files would still be using
> MacRoman. Current Mac apps generally default to UTF-8.
So, our recommendation now is to try UTF-8 first; then, try some other
encoding deduced from the context (user's localization, intended
source/destination of the data, etc). If all failed, should try
MacRoman as the ultimate fallback (the encoding has no gap so never
fails).
Aki
>
>
> —Jens_______________________________________________
>
> Cocoa-dev mailing list (<email_removed>)
>
> Please do not post admin requests or moderator comments to the list.
> Contact the moderators at cocoa-dev-admins(at)lists.apple.com
>
> Help/Unsubscribe/Update your Subscription:
> http://lists.apple.com/mailman/options/cocoa-dev/<email_removed>
>
> This email sent to <email_removed>






Cocoa mail archive

