Re: [cti-cybox] Re: CybOX 3.0: HashType Refactoring

I need to read up more on JSON-LD myself, but I think Jason is largely correct, in that we’ll have a data model and a serialization of it that includes the corresponding schemas (likely JSON).

Going back to the original discussion, I think the broad questions are:

Do we need a controlled vocabulary around hashing algorithms?
How should non-standard/esoteric hashes be captured?

My thoughts are:

1. No. Controlled vocabularies make the most sense when there is no expectation that what they’re capturing is complete and/or there is a large need for it to be customized by content producers. I don’t think this is the case with cryptographic hashing algorithms; they’re largely stable and standardized for the most part.

2. I’m a fan of key value pairs for their simplicity, so I think having either a separate type or separate field for the custom hash name (as in my proposal below) is how I would approach it:

{
"file" : {
"hashes" : [
{
"hash": "3773a88f65a5e780c8dff9cdc3a056f3",
"type": ”md5"
},
{
"hash": "f49125dac3:352bb35ffrca2:a123dc4599245",
"custom_type": "superhash" # A "custom" hash type.
},
]
}
}

Regards,

Ivan

From: Jason Keirstead
Date: Monday, November 2, 2015 at 1:04 PM
To: John Anderson
Cc: "cti-cybox@lists.oasis-open.org", Ivan Kirillov
Subject: Re: [cti-cybox] Re: CybOX 3.0: HashType Refactoring

This is not the same thing.... see JSON-Schema for how to validate against a JSON schema (http://json-schema.org/example2.html).

Namely, you define your schema in a different JSON document. That document can be used to validate any other document. Type information in the content messages are not necessary for validation to a schema, in fact, it's superfluous as the schema defines the type.

-
Jason Keirstead
Product Architect, Security Intelligence, IBM Security Systems
www.ibm.com/security | www.securityintelligence.com

Without data, all you are is just another person with an opinion - Unknown

John Anderson ---2015/11/02 01:54:12 PM---"All a developer would do with that @type information at the top level and file level is throw it aw

From: John Anderson <janderson@soltra.com>
To: Jason Keirstead/CanEast/IBM@IBMCA
Cc: "cti-cybox@lists.oasis-open.org" <cti-cybox@lists.oasis-open.org>, "Kirillov, Ivan A." <ikirillov@mitre.org>
Date: 2015/11/02 01:54 PM
Subject: Re: [cti-cybox] Re: CybOX 3.0: HashType Refactoring
Sent by: <cti-cybox@lists.oasis-open.org>

"All a developer would do with that @type information at the top level and file level is throw it away, so it is extra bytes that are not required in the message." That sounds like throwing away all the XML namespace info, too. Sure, you don't need it...if you're not trying to validate against a published schema.

Is that what you mean?

From: cti-cybox@lists.oasis-open.org <cti-cybox@lists.oasis-open.org> on behalf of Jason Keirstead <Jason.Keirstead@ca.ibm.com>
Sent: Monday, November 2, 2015 12:50 PM
To: John Anderson
Cc: cti-cybox@lists.oasis-open.org; Kirillov, Ivan A.
Subject: Re: [cti-cybox] Re: CybOX 3.0: HashType Refactoring

I've been using JSON extensively for a very long time, but I don't know anything about JSON-LD, and will not pretend to.

All I am saying is, having superfluous information in the message should be strongly discouraged.. we need to try to be as terse as possible. A big reason for the move to JSON in the first place was to reduce the message overhead affiliated with XML.

All a developer would do with that @type information at the top level and file level is throw it away, so it is extra bytes that are not required in the message. Inside the "hash" level obviously it is required and has meaning, and should be present there.

The root concept is - if the type of an attribute is defined in the specification then there is no reason to have it as part of the message.

-
Jason Keirstead
Product Architect, Security Intelligence, IBM Security Systems
www.ibm.com/security | www.securityintelligence.com

Without data, all you are is just another person with an opinion - Unknown

John Anderson ---2015/11/02 01:45:25 PM---So, that's the question: How does JSON-LD extend vocabularies, and how does that affect the JSON rep

From: John Anderson <janderson@soltra.com>
To: Jason Keirstead/CanEast/IBM@IBMCA
Cc: "cti-cybox@lists.oasis-open.org" <cti-cybox@lists.oasis-open.org>, "Kirillov, Ivan A." <ikirillov@mitre.org>
Date: 2015/11/02 01:45 PM
Subject: Re: [cti-cybox] Re: CybOX 3.0: HashType Refactoring
Sent by: <cti-cybox@lists.oasis-open.org>

So, that's the question: How does JSON-LD extend vocabularies, and how does that affect the JSON representation? How would you express the idea of a custom algorithm hash, Jason?

Thanks, Jason. I think I understand. Are you saying that the "@context" will define the schema, and therefore the schema for all sub-items as well?

If so, then "superhash" would be an extension to the hash "algorithm" (renamed) vocabulary, courtesy of the "mycybox++" context.

That would simplify the JSON to this:

{
"@context": "http://cybox.example.com/mycybox++",
"@type": "Observable",
"file" : {
"hashes" : [
{
"hash": "3773a88f65a5e780c8dff9cdc3a056f3",
"algorithm": "md5" # default type defined in CybOX
},
{
"hash": "f49125dac3:352bb35ffrca2:a123dc4599245",
"algorithm": "superhash" # new type from my cybox++
},
]
}
}

Two observations:
1. A context (aka "schema") would be able to extend the vocabulary.
2. Users who want to use an extended vocabulary would have to create (and share!) a new context, if they want others to understand their objects.

How is this different from our current situation with custom vocabularies in XML?
JSA

Ivan,
Could some ideas from JSON-LD help us here?

Disclaimer: I'm not sure JSON-LD allows embedding objects like this or extending a context, like I've done.

Also, there's a "@vocab" thing in JSON-LD. But once we start using vocabularies, we're heading down the road toward Ontologically-Correct Disunity (OCD).
{
"@context": "http://cybox.example.com/mycybox++",
"@type": "Observable",
"file" : {
"@type": "File",
"hashes" : [
{
"hash": "3773a88f65a5e780c8dff9cdc3a056f3",
"@type": "md5" # default type defined in CybOX
},
{
"hash": "f49125dac3:352bb35ffrca2:a123dc4599245",
"@type": "superhash" # new type from my cybox++
},
]
}
}
JSA

From: Kirillov, Ivan A. <ikirillov@mitre.org>
Sent: Monday, November 2, 2015 10:54 AM
To: John Anderson; cti-cybox@lists.oasis-open.org
Subject: Re: CybOX 3.0: HashType Refactoring

It makes sense, and I can definitely see the parallels to the IP Address refactoring :)

My main concern is that if the “type” field is intended to capture a set of default hash types and also support custom values, then it will likely need to use a controlled vocabulary, which gets us back to the original HashType implementation and its corresponding complexity:

{
"file" : {
"hashes" : [
{
"hash": "3773a88f65a5e780c8dff9cdc3a056f3",
"type": {"vocabulary":"HashNameVocab-1.0", "value":”md5"}
},
{
"hash": "f49125dac3:352bb35ffrca2:a123dc4599245",
"type": "superhash" # A "custom" hash type.
},
]
}
}

A possible middle ground is to have the “type” field set to a hard-coded enumeration (with values of “md5”, “sha1”, “sha256” etc.), and have a separate “custom_type” field for custom hash values. This negates the need for a controlled vocabulary driven approach, and thus would still be simpler. I think “custom_type” or “type” would always have to be specified though, as you can’t reliably infer the type of hash from a particular value (although you can make educated guesses – if the value is 16 bytes in length, odds are it’s MD5):

{
"file" : {
"hashes" : [
{
"hash": "3773a88f65a5e780c8dff9cdc3a056f3",
"type": ”md5"
},
{
"hash": "f49125dac3:352bb35ffrca2:a123dc4599245",
"custom_type": "superhash" # A "custom" hash type.
},
]
}
}

What do you think?

Regards,
Ivan

From: John Anderson
Date: Monday, November 2, 2015 at 10:19 AM
To: Ivan Kirillov, "cti-cybox@lists.oasis-open.org"
Subject: Re: CybOX 3.0: HashType Refactoring

This Hash refactoring seems to parallel the IP Address refactoring. Would it make sense to treat hashes the same way we treat IP Addresses?
By applying that idea to the example on the page, we get something like this:

{
"file" : {
"hashes" : [
{
"hash": "3773a88f65a5e780c8dff9cdc3a056f3",
"type": "md5"
},
{
"hash": "f49125dac3:352bb35ffrca2:a123dc4599245",
"type": "superhash" # A "custom" hash type.
},
{
"hash": "12343773a88f65a5e780c8dff9cdc3a0"
# Default is "md5", if it's not specified.
}
]
}
}

Whadayathink?
JSA

From: cti-cybox@lists.oasis-open.org <cti-cybox@lists.oasis-open.org> on behalf of Kirillov, Ivan A. <ikirillov@mitre.org>
Sent: Monday, November 2, 2015 10:07 AM
To: cti-cybox@lists.oasis-open.org
Subject: [cti-cybox] CybOX 3.0: HashType Refactoring

All,

As I mentioned on last week’s call, we’ve got another proposal related to CybOX 3.0 to get your feedback on: https://github.com/CybOXProject/schemas/wiki/CybOX-3.0:-HashType-Refactoring

CybOXProject/schemas

schemas - CybOX Schemas and Schema Development

cti-cybox message