Re: [cti-cybox] CybOX 3.0: HashType Refactoring

On Nov 3, 2015, at 09:38, Jason Keirstead <Jason.Keirstead@ca.ibm.com> wrote:

Agree. +1 :)

-
Jason Keirstead
Product Architect, Security Intelligence, IBM Security Systems
www.ibm.com/security | www.securityintelligence.com

Without data, all you are is just another person with an opinion - Unknown

<graycol.gif>"Kirillov, Ivan A." ---2015/11/03 11:32:28 AM---Yes, I absolutely agree on the utility of enumerations, and I probably should have clarified my poin

From: "Kirillov, Ivan A." <ikirillov@mitre.org>
To: Jason Keirstead/CanEast/IBM@IBMCA, "Davidson II, Mark S" <mdavidson@mitre.org>
Cc: "cti-cybox@lists.oasis-open.org" <cti-cybox@lists.oasis-open.org>, "John Anderson" <janderson@soltra.com>
Date: 2015/11/03 11:32 AM
Subject: Re: [cti-cybox] Re: CybOX 3.0: HashType Refactoring
Sent by: <cti-cybox@lists.oasis-open.org>

Yes, I absolutely agree on the utility of enumerations, and I probably should have clarified my point accordingly. Anyhow, my thought is that the “type” field in HashType should NOT be implemented through a controlled vocabulary but should instead yse a fixed enumeration that is defined as part of the CybOX 3.0 specification:

“type": {
"enum": [ “md5", “md5", “sha1”, “sha256”, etc. ]}
Regards,
Ivan

From: Jason Keirstead
Date: Tuesday, November 3, 2015 at 8:12 AM
To: Mark Davidson
Cc: "cti-cybox@lists.oasis-open.org", Ivan Kirillov, John Anderson
Subject: RE: [cti-cybox] Re: CybOX 3.0: HashType Refactoring
I think the hashing algorithms should be either a controlled vocabulary or a type enum like Jerome suggested, that is part of the specification. Anything that a coder would implement as an Enumeration, should be a controlled vocabulary or an enumeration.

RE:
"1. No. Controlled vocabularies make the most sense when there is no expectation that what they’re capturing is complete and/or there is a large need for it to be customized by content producers. I don’t think this is the case with cryptographic hashing algorithms; they’re largely stable and standardized for the most part."

The reason you need this is not because you see it being extended, it is so that everyone agrees on how it should be entered into the document so that it can be parsed properly and efficiently. "MD5" vs "md5", "sha" vs "SHA-1" vs "sha256" vs "SHA-256"

-
Jason Keirstead
Product Architect, Security Intelligence, IBM Security Systems
www.ibm.com/security | www.securityintelligence.com

Without data, all you are is just another person with an opinion - Unknown

<graycol.gif>"Davidson II, Mark S" ---2015/11/03 08:43:46 AM---My comment is really about controlled vocabularies in general. I tend to have a gut reaction of want

From: "Davidson II, Mark S" <mdavidson@mitre.org>
To: "Kirillov, Ivan A." <ikirillov@mitre.org>, Jason Keirstead/CanEast/IBM@IBMCA, John Anderson <janderson@soltra.com>
Cc: "cti-cybox@lists.oasis-open.org" <cti-cybox@lists.oasis-open.org>
Date: 2015/11/03 08:43 AM
Subject: RE: [cti-cybox] Re: CybOX 3.0: HashType Refactoring

My comment is really about controlled vocabularies in general. I tend to have a gut reaction of wanting to do away with controlled vocabularies wherever we have them because they are hard for me to implement. That said, I think changing two key factors about controlled vocabularies would change the way I feel about them.

I think we should consider improving controlled vocabularies in these two areas:
· REQUIRE a specific controlled vocabulary, allow other controlled vocabularies. Right now any vocabulary is just as valid as any other vocabulary, and this makes things more difficult. STIX/CybOX do have the notion of default vocabularies, but this seems like more of a starting point than a definition. If we made a single vocabulary required and all other vocabularies optional (perhaps calling them “third party vocabularies”) I think that would go a long way toward making controlled vocabularies easier to implement.
o As a sub-point, I think MTI vocabularies should be specified in the overall spec, as this reduces the number of overall moving parts for people to track.
· DEFINE the semantics for each value. In many places the meaning of certain controlled vocabularies is unspecified. When I get an indicator with an IndicatorType of “URL Watchlist”, what does that mean exactly? The XSD annotation has some descriptive text (in this case, “Indicator describes a set of suspected malicious URLS”), but it doesn’t tell me how the value of this field changes how I process (or not) the indicator. I can make an inference based on experience, but we should seek to improve these definitions so that implementers have an easier time. If this is just a label and not meant for processing, we should call it out as such.

If controlled vocabularies were to meet the requirements I lay out above, I would have no opinion on whether hashes use a default vocabulary or not. As controlled vocabularies currently stand, my preference is for not using them.

Thank you.
-Mark
From: cti-cybox@lists.oasis-open.org [mailto:cti-cybox@lists.oasis-open.org] On Behalf Of Kirillov, Ivan A.
Sent: Tuesday, November 03, 2015 7:12 AM
To: Jason Keirstead <Jason.Keirstead@ca.ibm.com>; John Anderson <janderson@soltra.com>
Cc: cti-cybox@lists.oasis-open.org
Subject: Re: [cti-cybox] Re: CybOX 3.0: HashType Refactoring

I need to read up more on JSON-LD myself, but I think Jason is largely correct, in that we’ll have a data model and a serialization of it that includes the corresponding schemas (likely JSON).

Going back to the original discussion, I think the broad questions are:
1. Do we need a controlled vocabulary around hashing algorithms?
2. How should non-standard/esoteric hashes be captured?
My thoughts are:
1. No. Controlled vocabularies make the most sense when there is no expectation that what they’re capturing is complete and/or there is a large need for it to be customized by content producers. I don’t think this is the case with cryptographic hashing algorithms; they’re largely stable and standardized for the most part.
2. I’m a fan of key value pairs for their simplicity, so I think having either a separate type or separate field for the custom hash name (as in my proposal below) is how I would approach it:

{
"file" : {
"hashes" : [
{
"hash": "3773a88f65a5e780c8dff9cdc3a056f3",
"type": ”md5"
},
{
"hash": "f49125dac3:352bb35ffrca2:a123dc4599245",
"custom_type": "superhash" # A "custom" hash type.
},
]
}
}

Regards,
Ivan

From: Jason Keirstead
Date: Monday, November 2, 2015 at 1:04 PM
To: John Anderson
Cc: "cti-cybox@lists.oasis-open.org", Ivan Kirillov
Subject: Re: [cti-cybox] Re: CybOX 3.0: HashType Refactoring
This is not the same thing.... see JSON-Schema for how to validate against a JSON schema (http://json-schema.org/example2.html).

Namely, you define your schema in a different JSON document. That document can be used to validate any other document. Type information in the content messages are not necessary for validation to a schema, in fact, it's superfluous as the schema defines the type.

-
Jason Keirstead
Product Architect, Security Intelligence, IBM Security Systems
www.ibm.com/security | www.securityintelligence.com

Without data, all you are is just another person with an opinion - Unknown

<graycol.gif>John Anderson ---2015/11/02 01:54:12 PM---"All a developer would do with that @type information at the top level and file level is throw it aw

From: John Anderson <janderson@soltra.com>
To: Jason Keirstead/CanEast/IBM@IBMCA
Cc: "cti-cybox@lists.oasis-open.org" <cti-cybox@lists.oasis-open.org>, "Kirillov, Ivan A." <ikirillov@mitre.org>
Date: 2015/11/02 01:54 PM
Subject: Re: [cti-cybox] Re: CybOX 3.0: HashType Refactoring
Sent by: <cti-cybox@lists.oasis-open.org>

"All a developer would do with that @type information at the top level and file level is throw it away, so it is extra bytes that are not required in the message." That sounds like throwing away all the XML namespace info, too. Sure, you don't need it...if you're not trying to validate against a published schema.

Is that what you mean?

From: cti-cybox@lists.oasis-open.org <cti-cybox@lists.oasis-open.org> on behalf of Jason Keirstead <Jason.Keirstead@ca.ibm.com>
Sent: Monday, November 2, 2015 12:50 PM
To: John Anderson
Cc: cti-cybox@lists.oasis-open.org; Kirillov, Ivan A.
Subject: Re: [cti-cybox] Re: CybOX 3.0: HashType Refactoring

I've been using JSON extensively for a very long time, but I don't know anything about JSON-LD, and will not pretend to.

All I am saying is, having superfluous information in the message should be strongly discouraged.. we need to try to be as terse as possible. A big reason for the move to JSON in the first place was to reduce the message overhead affiliated with XML.

All a developer would do with that @type information at the top level and file level is throw it away, so it is extra bytes that are not required in the message. Inside the "hash" level obviously it is required and has meaning, and should be present there.

The root concept is - if the type of an attribute is defined in the specification then there is no reason to have it as part of the message.

-
Jason Keirstead
Product Architect, Security Intelligence, IBM Security Systems
www.ibm.com/security | www.securityintelligence.com

Without data, all you are is just another person with an opinion - Unknown

John Anderson ---2015/11/02 01:45:25 PM---So, that's the question: How does JSON-LD extend vocabularies, and how does that affect the JSON rep

From: John Anderson <janderson@soltra.com>
To: Jason Keirstead/CanEast/IBM@IBMCA
Cc: "cti-cybox@lists.oasis-open.org" <cti-cybox@lists.oasis-open.org>, "Kirillov, Ivan A." <ikirillov@mitre.org>
Date: 2015/11/02 01:45 PM
Subject: Re: [cti-cybox] Re: CybOX 3.0: HashType Refactoring
Sent by: <cti-cybox@lists.oasis-open.org>

So, that's the question: How does JSON-LD extend vocabularies, and how does that affect the JSON representation? How would you express the idea of a custom algorithm hash, Jason?

From: cti-cybox@lists.oasis-open.org <cti-cybox@lists.oasis-open.org> on behalf of Jason Keirstead <Jason.Keirstead@ca.ibm.com>
Sent: Monday, November 2, 2015 12:36 PM
To: John Anderson
Cc: cti-cybox@lists.oasis-open.org; Kirillov, Ivan A.
Subject: Re: [cti-cybox] Re: CybOX 3.0: HashType Refactoring

I would think the schema will be defined as part of the Cybox 3.0 specification itself, will it not?

The schema can not change once defined without envisioning the standard. When someone parses that "file" attribute, they will always expect the exact same data structures beneath it.
-
Jason Keirstead
Product Architect, Security Intelligence, IBM Security Systems
www.ibm.com/security | www.securityintelligence.com

Without data, all you are is just another person with an opinion - Unknown

John Anderson ---2015/11/02 01:28:38 PM---Thanks, Jason. I think I understand. Are you saying that the "@context" will define the schema, and

From: John Anderson <janderson@soltra.com>
To: Jason Keirstead/CanEast/IBM@IBMCA
Cc: "Kirillov, Ivan A." <ikirillov@mitre.org>, "cti-cybox@lists.oasis-open.org" <cti-cybox@lists.oasis-open.org>
Date: 2015/11/02 01:28 PM
Subject: Re: [cti-cybox] Re: CybOX 3.0: HashType Refactoring

Thanks, Jason. I think I understand. Are you saying that the "@context" will define the schema, and therefore the schema for all sub-items as well?

If so, then "superhash" would be an extension to the hash "algorithm" (renamed) vocabulary, courtesy of the "mycybox++" context.

That would simplify the JSON to this:

{
"@context": "http://cybox.example.com/mycybox++",
"@type": "Observable",
"file" : {
"hashes" : [
{
"hash": "3773a88f65a5e780c8dff9cdc3a056f3",
"algorithm": "md5" # default type defined in CybOX
},
{
"hash": "f49125dac3:352bb35ffrca2:a123dc4599245",
"algorithm": "superhash" # new type from my cybox++
},
]
}
}

Two observations:
1. A context (aka "schema") would be able to extend the vocabulary.
2. Users who want to use an extended vocabulary would have to create (and share!) a new context, if they want others to understand their objects.

How is this different from our current situation with custom vocabularies in XML?
JSA

From: cti-cybox@lists.oasis-open.org <cti-cybox@lists.oasis-open.org> on behalf of Jason Keirstead <Jason.Keirstead@ca.ibm.com>
Sent: Monday, November 2, 2015 11:51 AM
To: John Anderson
Cc: Kirillov, Ivan A.; cti-cybox@lists.oasis-open.org
Subject: Re: [cti-cybox] Re: CybOX 3.0: HashType Refactoring

Most of those @type sections seem totally superfluous to me.

IE - I know the object affiliated with the "file" attribute will be a File type. I do not need you to tell me this.

-
Jason Keirstead
Product Architect, Security Intelligence, IBM Security Systems
www.ibm.com/security | www.securityintelligence.com

Without data, all you are is just another person with an opinion - Unknown

John Anderson ---2015/11/02 12:07:40 PM---Ivan, Could some ideas from JSON-LD help us here?

From: John Anderson <janderson@soltra.com>
To: "Kirillov, Ivan A." <ikirillov@mitre.org>, "cti-cybox@lists.oasis-open.org" <cti-cybox@lists.oasis-open.org>
Date: 2015/11/02 12:07 PM
Subject: [cti-cybox] Re: CybOX 3.0: HashType Refactoring
Sent by: <cti-cybox@lists.oasis-open.org>

Ivan,
Could some ideas from JSON-LD help us here?

Disclaimer: I'm not sure JSON-LD allows embedding objects like this or extending a context, like I've done.

Also, there's a "@vocab" thing in JSON-LD. But once we start using vocabularies, we're heading down the road toward Ontologically-Correct Disunity (OCD).
{
"@context": "http://cybox.example.com/mycybox++",
"@type": "Observable",
"file" : {
"@type": "File",
"hashes" : [
{
"hash": "3773a88f65a5e780c8dff9cdc3a056f3",
"@type": "md5" # default type defined in CybOX
},
{
"hash": "f49125dac3:352bb35ffrca2:a123dc4599245",
"@type": "superhash" # new type from my cybox++
},
]
}
}
JSA

From: Kirillov, Ivan A. <ikirillov@mitre.org>
Sent: Monday, November 2, 2015 10:54 AM
To: John Anderson; cti-cybox@lists.oasis-open.org
Subject: Re: CybOX 3.0: HashType Refactoring

It makes sense, and I can definitely see the parallels to the IP Address refactoring :)

My main concern is that if the “type” field is intended to capture a set of default hash types and also support custom values, then it will likely need to use a controlled vocabulary, which gets us back to the original HashType implementation and its corresponding complexity:

{
"file" : {
"hashes" : [
{
"hash": "3773a88f65a5e780c8dff9cdc3a056f3",
"type": {"vocabulary":"HashNameVocab-1.0", "value":”md5"}
},
{
"hash": "f49125dac3:352bb35ffrca2:a123dc4599245",
"type": "superhash" # A "custom" hash type.
},
]
}
}

A possible middle ground is to have the “type” field set to a hard-coded enumeration (with values of “md5”, “sha1”, “sha256” etc.), and have a separate “custom_type” field for custom hash values. This negates the need for a controlled vocabulary driven approach, and thus would still be simpler. I think “custom_type” or “type” would always have to be specified though, as you can’t reliably infer the type of hash from a particular value (although you can make educated guesses – if the value is 16 bytes in length, odds are it’s MD5):

{
"file" : {
"hashes" : [
{
"hash": "3773a88f65a5e780c8dff9cdc3a056f3",
"type": ”md5"
},
{
"hash": "f49125dac3:352bb35ffrca2:a123dc4599245",
"custom_type": "superhash" # A "custom" hash type.
},
]
}
}

What do you think?

Regards,
Ivan

From: John Anderson
Date: Monday, November 2, 2015 at 10:19 AM
To: Ivan Kirillov, "cti-cybox@lists.oasis-open.org"
Subject: Re: CybOX 3.0: HashType Refactoring

This Hash refactoring seems to parallel the IP Address refactoring. Would it make sense to treat hashes the same way we treat IP Addresses?
By applying that idea to the example on the page, we get something like this:

{
"file" : {
"hashes" : [
{
"hash": "3773a88f65a5e780c8dff9cdc3a056f3",
"type": "md5"
},
{
"hash": "f49125dac3:352bb35ffrca2:a123dc4599245",
"type": "superhash" # A "custom" hash type.
},
{
"hash": "12343773a88f65a5e780c8dff9cdc3a0"
# Default is "md5", if it's not specified.
}
]
}
}

Whadayathink?
JSA

From: cti-cybox@lists.oasis-open.org <cti-cybox@lists.oasis-open.org> on behalf of Kirillov, Ivan A. <ikirillov@mitre.org>
Sent: Monday, November 2, 2015 10:07 AM
To: cti-cybox@lists.oasis-open.org
Subject: [cti-cybox] CybOX 3.0: HashType Refactoring

All,

As I mentioned on last week’s call, we’ve got another proposal related to CybOX 3.0 to get your feedback on: https://github.com/CybOXProject/schemas/wiki/CybOX-3.0:-HashType-Refactoring
<09564010.gif>

CybOXProject/schemas
schemas - CybOX Schemas and Schema Development
Read more...

This one is around refactoring the way hashes (especially common ones like MD5 and SHA1) are currently captured. Accordingly, we’d love to get your general thoughts on the proposal as well as on the related questions:
1. Does it make sense to have two disparate types for capturing hashes in CybOX, one for more common hashes and one for esoteric/custom hashes?
2. As far as the list of hashes in the new HashesType – are there any that are missing? Are there any that should be pruned?
3. Are there any fields that should be added to the new CustomHashType?
Regards,
Ivan and Trey

--------------------------------------------------------------------- To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail. Follow this link to all your TCs in OASIS at:https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
[attachment "graycol.gif" deleted by Jason Keirstead/CanEast/IBM] [attachment "ecblank.gif" deleted by Jason Keirstead/CanEast/IBM]

cti-cybox message