OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

cti-stix message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [cti-stix] Vocab case sensitivity in STIX


Jason Keirstead wrote this message on Wed, Jun 08, 2016 at 13:44 -0300:
> Case insensitivity can get extremely complicated with non-latin characters.
> 
> The definitive example is Turkish -
> http://www.i18nguy.com/unicode/turkish-i18n.html

This is exactly why I support 3...  If we support 2, we need to define
either a limited character set (e.g. latin-1 only) with well defined
rules, or a well defined rules on case sensitivity for ALL unicode
characters, and be willing to break other languages like Turkish...

The header on:
http://www.unicode.org/Public/UCD/latest/ucd/CaseFolding.txt

Helps...  Also points out that some case transitions involve going
from one code point to two...

Hmm... I did find W3C's case folding page:
https://www.w3.org/International/wiki/Case_folding

So, anyone who has an opinion on this topic should read it, and then
decide if they want to change their vote...

More info on case mapping from Unicode:
http://unicode.org/faq/casemap_charprop.html

Another fun example from the Unicode page:
"For example, while the default uppercase mapping of "a" is "A" and
the default mapping of "à" is "À", the uppercase conversion of "
e vais à Paris" in some forms of French might be "JE VAIS A PARIS"
Notice how the "à" is uppercased as "A" in this case."

IMO, the spec should be 3, but we provide non-normative text on how
organizations and vendor products should allow such input..  If all
the tools follow the rules, then the issues about comparision is a
non-issue...

-- 
John-Mark


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]