FROM : Andrew Thompson
DATE : Wed Jan 08 02:25:37 2003
On Tuesday, Jan 7, 2003, at 02:18 America/New_York, Dietrich Epp wrote:
>>
>> As I understand it that would require I have a Big Honking Table (tm)
>> of composed char <-> surrogate pair mappings, if I want to be able to
>> handle the general case and not just a few characters of interest. Is
>> such a thing readily available?
>>
>> eg,
>>
>> is the mapping a mathematical function?
>> or is it a big lookup table
>> or does it require lots of context and domain knowledge about the
>> script in question?
>>
>> If it is a table, is it available in machine readable form anywhere?
>
> Why would it require a table?
>
> Surrogates are covered in section 3.7 of the Unicode standard. Given
> a character, C, which is at least $10000, the surrogate pair is:
> (C - $10000)/$400 + $D800, (C - $10000)%$400 + $DC00
>
> This cannot encode very many 32-bit numbers as surrogates only encode
> up to 20 bits. But it should be enough, and characters in such a high
> range aren't actually assigned (except some for private use).
> Surrogates are illegal unpaired, in UTF-8, or UTF-32.
Ah, it is a function. Thanks for that!
Unicode is one of those things where the more I learn about it, the
more I find there is still to learn.
Now I get what those reserved ranges High Surrogates and Low Surrogates
are for. Cool.
AndyT (lordpixel - the cat who walks through walls)
A little bigger on the inside
(see you later space cowboy ...)
_______________________________________________
cocoa-dev mailing list | <email_removed>
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.
DATE : Wed Jan 08 02:25:37 2003
On Tuesday, Jan 7, 2003, at 02:18 America/New_York, Dietrich Epp wrote:
>>
>> As I understand it that would require I have a Big Honking Table (tm)
>> of composed char <-> surrogate pair mappings, if I want to be able to
>> handle the general case and not just a few characters of interest. Is
>> such a thing readily available?
>>
>> eg,
>>
>> is the mapping a mathematical function?
>> or is it a big lookup table
>> or does it require lots of context and domain knowledge about the
>> script in question?
>>
>> If it is a table, is it available in machine readable form anywhere?
>
> Why would it require a table?
>
> Surrogates are covered in section 3.7 of the Unicode standard. Given
> a character, C, which is at least $10000, the surrogate pair is:
> (C - $10000)/$400 + $D800, (C - $10000)%$400 + $DC00
>
> This cannot encode very many 32-bit numbers as surrogates only encode
> up to 20 bits. But it should be enough, and characters in such a high
> range aren't actually assigned (except some for private use).
> Surrogates are illegal unpaired, in UTF-8, or UTF-32.
Ah, it is a function. Thanks for that!
Unicode is one of those things where the more I learn about it, the
more I find there is still to learn.
Now I get what those reserved ranges High Surrogates and Low Surrogates
are for. Cool.
AndyT (lordpixel - the cat who walks through walls)
A little bigger on the inside
(see you later space cowboy ...)
_______________________________________________
cocoa-dev mailing list | <email_removed>
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.






Cocoa mail archive

