[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [cti-stix] Vocab case sensitivity in STIX
Jason Keirstead wrote this message on Wed, Jun 08, 2016 at 13:44 -0300: > Case insensitivity can get extremely complicated with non-latin characters. > > The definitive example is Turkish - > http://www.i18nguy.com/unicode/turkish-i18n.html This is exactly why I support 3... If we support 2, we need to define either a limited character set (e.g. latin-1 only) with well defined rules, or a well defined rules on case sensitivity for ALL unicode characters, and be willing to break other languages like Turkish... The header on: http://www.unicode.org/Public/UCD/latest/ucd/CaseFolding.txt Helps... Also points out that some case transitions involve going from one code point to two... Hmm... I did find W3C's case folding page: https://www.w3.org/International/wiki/Case_folding So, anyone who has an opinion on this topic should read it, and then decide if they want to change their vote... More info on case mapping from Unicode: http://unicode.org/faq/casemap_charprop.html Another fun example from the Unicode page: "For example, while the default uppercase mapping of "a" is "A" and the default mapping of "à" is "À", the uppercase conversion of " e vais à Paris" in some forms of French might be "JE VAIS A PARIS" Notice how the "à" is uppercased as "A" in this case." IMO, the spec should be 3, but we provide non-normative text on how organizations and vendor products should allow such input.. If all the tools follow the rules, then the issues about comparision is a non-issue... -- John-Mark
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]