OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

cti-stix message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [cti-stix] Vocab case sensitivity in STIX


Case insensitivity can get extremely complicated with non-latin characters.

The definitive example is Turkish - http://www.i18nguy.com/unicode/turkish-i18n.html

-
Jason Keirstead
STSM, Product Architect, Security Intelligence, IBM Security Systems
www.ibm.com/security | www.securityintelligence.com

Without data, all you are is just another person with an opinion - Unknown


Inactive hide details for "Wunder, John A." ---06/08/2016 12:33:11 PM---I think that makes use roughly 50:50, with a preference"Wunder, John A." ---06/08/2016 12:33:11 PM---I think that makes use roughly 50:50, with a preference towards #2 given people’s fallback choices.

From: "Wunder, John A." <jwunder@mitre.org>
To: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>
Date: 06/08/2016 12:33 PM
Subject: Re: [cti-stix] Vocab case sensitivity in STIX
Sent by: <cti-stix@lists.oasis-open.org>





I think that makes use roughly 50:50, with a preference towards #2 given people’s fallback choices.

I was curious how lower-casing works with non-latin characters and it seems doable, though naturally more complicated than you would hope:
http://stackoverflow.com/questions/929079/unicode-lowercase-characters

Other languages don’t really have case distinctions so the topic isn’t relevant to them. For normative requirement purposes we can probably identify an existing place where people specify upper-case Unicode characters and just prohibit them.

John

On 6/8/16, 10:38 AM, "cti-stix@lists.oasis-open.org on behalf of Paul Patrick" <cti-stix@lists.oasis-open.org on behalf of ppatrick@isightpartners.com> wrote:

>I’m a fan of #3
>
>On 6/8/16, 10:34 AM, "cti-stix@lists.oasis-open.org on behalf of Wunder, John A." <cti-stix@lists.oasis-open.org on behalf of jwunder@mitre.org> wrote:
>
>>I can live with #2.
>>
>>On 6/8/16, 9:54 AM, "Allan Thomson" <athomson@lookingglasscyber.com> wrote:
>>
>>>Hi John – thanks for sending the options.
>>>
>>>Although I prefer Option 1, I think Option 2 is a reasonable middle ground for me as that is easier to just map all strings to lowercase.
>>>
>>>allan
>>>
>>>On 6/8/16, 5:44 AM, "cti-stix@lists.oasis-open.org on behalf of Wunder, John A." <cti-stix@lists.oasis-open.org on behalf of jwunder@mitre.org> wrote:
>>>
>>>>This topic applies to all open and controlled vocabularies, not just kill chains, so I changed the subject line.
>>>>
>>>>To sum up, I’m hearing three options:
>>>>
>>>>1. Terms are defined as case-insensitive in the specification and implementations MUST treat Threat-Blah as == THREAT-BLAH
>>>>2. Terms are defined as case-sensitive in the specification, but values of that field MUST be lower-case (note: I don’t know what this means for non-Latin character sets, if anything. I assume there’s prior art we can use though)
>>>>3. Terms are defined as case-sensitive in the specification, we have a SHOULD requirement to follow our naming and design rules unless this is a good reason not to (i.e. the tool has existing values for that field it can’t or doesn’t want to change). This is how the spec is written now.
>>>>
>>>>Correct me if I’m wrong, but here are the opinions I’ve heard:
>>>>
>>>>Allan prefers #1.
>>>>Bret prefers #2.
>>>>Myself, Jason, and JMG prefer #3
>>>>
>>>>Anybody else want to weigh in?
>>>>
>>>>John
>>>>
>>>>On 6/8/16, 1:31 AM, "Jordan, Bret" <bret.jordan@bluecoat.com> wrote:
>>>>
>>>>>I would greatly prefer that all vocabs are case sensitive and that they MUST be lower-case.  That makes it very simple all the way around.
>>>>>
>>>>>Bret
>>>>>
>>>>>Sent from my Commodore 64
>>>>>
>>>>>> On Jun 8, 2016, at 1:41 AM, Allan Thomson <athomson@lookingglasscyber.com> wrote:
>>>>>>
>>>>>> I think we are discussing trade-offs that impact products creating or using STIX.
>>>>>>
>>>>>> I personally much prefer lower case for all terms but that’s not the point of deciding case sensitive or not.
>>>>>>
>>>>>> I think you should also consider the users of our products in this.
>>>>>>
>>>>>> A user will not know which case the STIX spec defined the terms in and products that expose these terms in their UI will have to support case insensitive searching/use.
>>>>>>
>>>>>> Users will just type what they think the term is without regard to uppercase, lowercase, camel-case ….etc.
>>>>>>
>>>>>> By making terms case sensitive in the protocol exchange you are forcing products to know what the exact case was used in the spec, and then products will have to know how to map from what users do to the underlying protocol uses.
>>>>>>
>>>>>> For me, not having to care about case sensitivity if a user enters a term of an open vocab in all CAPS when the spec was defined in lowercase then that would be a good thing.
>>>>>>
>>>>>> I also think for open vocabs products will have to support the option to extend the vocab and therefore unless you are careful you could end up with multiple versions of the same term just because the user’s entered the term using different cases.
>>>>>>
>>>>>> For example, all of the following are clearly the same term:
>>>>>>
>>>>>> THREAT-BLAH
>>>>>> Threat-Blah
>>>>>> threat-blah
>>>>>> threat-Blah
>>>>>> threat-BLAH
>>>>>>
>>>>>> ….etc.
>>>>>>
>>>>>> Allan
>>>>>>
>>>>>>> On 6/7/16, 4:53 PM, "John-Mark Gurney" <jmg@newcontext.com> wrote:
>>>>>>>
>>>>>>> Jason Keirstead wrote this message on Tue, Jun 07, 2016 at 09:04 -0300:
>>>>>>>> I would vastly prefer that the standard declares that vocabularies are
>>>>>>>> case-sensitive. If vocabularies are case-insensitive it is a headache. Note
>>>>>>>> that I am *not* saying that I think that we should mandate that entries all
>>>>>>>> be lower-case - I am saying that we should mandate that the vocabulary is
>>>>>>>> case-sensitive and compares should be done that way.
>>>>>>>
>>>>>>> I agree...  Trying to do case insensitive compares intorduces complexities
>>>>>>> that case sensitive does not..  Simple ==/strcmp for most uses...
>>>>>>>
>>>>>>> --
>>>>>>> John-Mark
>>>>>>
>>>>
>>>
>>
>






[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]