Re: [cti-stix] Vocab case sensitivity in STIX

I don’t think we should mandate that values from extended vocabularies (either other values in open vocabs, or extension values in controlled vocabs) be in English…ignoring the issues actually verifying that (either as a tool trying to produce valid content or as a validation program), it means that people doing STIX in other languages either need to have some ability to translate to English. Or, they can’t use extended vocab values because they can’t produce English text.

The values in vocabularies we define should all be in English. They’re pre-defined and tools can localize their interfaces with appropriate translations even in completely non-English ecosystems…they wouldn’t have that same ability for tool or user developed values.

Let’s schedule this topic for the call on Tuesday. If we aren’t able to resolve it then, it should probably go to a vote.

John

From: <cti-stix@lists.oasis-open.org> on behalf of Terry MacDonald <terry.macdonald@cosive.com>
Date: Wednesday, June 8, 2016 at 6:34 PM
To: John-Mark Gurney <jmg@newcontext.com>
Cc: Jason Keirstead <Jason.Keirstead@ca.ibm.com>, "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, "Wunder, John A." <jwunder@mitre.org>
Subject: Re: [cti-stix] Vocab case sensitivity in STIX

One point for vocabs.... I thought we had decided that all controlled vocabularies would be defined in the standard as English, and that it was up to the local implementation to provide translations in other languages.

If this is still the case, does this also apply to open vocabs? If this is the case then I'd go option #3 (fallback #2). Otherwise if we are still going English only then option #1 seems logical.

Cheers
Terry MacDonald

On 9/06/2016 8:13 AM, "John-Mark Gurney" <jmg@newcontext.com> wrote:

Jason Keirstead wrote this message on Wed, Jun 08, 2016 at 13:44 -0300:
> Case insensitivity can get extremely complicated with non-latin characters.
>
> The definitive example is Turkish -
> http://www.i18nguy.com/unicode/turkish-i18n.html

This is exactly why I support 3... If we support 2, we need to define
either a limited character set (e.g. latin-1 only) with well defined
rules, or a well defined rules on case sensitivity for ALL unicode
characters, and be willing to break other languages like Turkish...

The header on:
http://www.unicode.org/Public/UCD/latest/ucd/CaseFolding.txt

Helps... Also points out that some case transitions involve going
from one code point to two...

Hmm... I did find W3C's case folding page:
https://www.w3.org/International/wiki/Case_folding

So, anyone who has an opinion on this topic should read it, and then
decide if they want to change their vote...

More info on case mapping from Unicode:
http://unicode.org/faq/casemap_charprop.html

Another fun example from the Unicode page:
"For example, while the default uppercase mapping of "a" is "A" and
the default mapping of "à" is "À", the uppercase conversion of "
e vais à Paris" in some forms of French might be "JE VAIS A PARIS"
Notice how the "à" is uppercased as "A" in this case."

IMO, the spec should be 3, but we provide non-normative text on how
organizations and vendor products should allow such input.. If all
the tools follow the rules, then the issues about comparision is a
non-issue...

--
John-Mark

---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that
generates this mail. Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php

cti-stix message