OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

cti-cybox message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [cti-cybox] CybOX 3.0: HashType Refactoring


+1


Thanks,

Bret



Bret Jordan CISSP
Director of Security Architecture and Standards | Office of the CTO
Blue Coat Systems
PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050
"Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg." 

On Nov 3, 2015, at 09:38, Jason Keirstead <Jason.Keirstead@ca.ibm.com> wrote:

Agree. +1 :)

-
Jason Keirstead
Product Architect, Security Intelligence, IBM Security Systems
www.ibm.com/security | www.securityintelligence.com

Without data, all you are is just another person with an opinion - Unknown


<graycol.gif>"Kirillov, Ivan A." ---2015/11/03 11:32:28 AM---Yes, I absolutely agree on the utility of enumerations, and I probably should have clarified my poin

From: "Kirillov, Ivan A." <ikirillov@mitre.org>
To: Jason Keirstead/CanEast/IBM@IBMCA, "Davidson II, Mark S" <mdavidson@mitre.org>
Cc: "cti-cybox@lists.oasis-open.org" <cti-cybox@lists.oasis-open.org>, "John Anderson" <janderson@soltra.com>
Date: 2015/11/03 11:32 AM
Subject: Re: [cti-cybox] Re: CybOX 3.0: HashType Refactoring
Sent by: <cti-cybox@lists.oasis-open.org>





Yes, I absolutely agree on the utility of enumerations, and I probably should have clarified my point accordingly. Anyhow, my thought is that the “type” field in HashType should NOT be implemented through a controlled vocabulary but should instead yse a fixed enumeration that is defined as part of the CybOX 3.0 specification:

“type": {
"enum": [ md5", md5", sha1”, “sha256”, etc. ]}
Regards,
Ivan

From: Jason Keirstead
Date:
Tuesday, November 3, 2015 at 8:12 AM
To:
Mark Davidson
Cc:
"cti-cybox@lists.oasis-open.org", Ivan Kirillov, John Anderson
Subject:
RE: [cti-cybox] Re: CybOX 3.0: HashType Refactoring

I think the hashing algorithms should be either a controlled vocabulary or a type enum like Jerome suggested, that is part of the specification. Anything that a coder would implement as an Enumeration, should be a controlled vocabulary or an enumeration.

RE:

      "1. No. Controlled vocabularies make the most sense when there is no expectation that what they’re capturing is complete and/or there is a large need for it to be customized by content producers. I don’t think this is the case with cryptographic hashing algorithms; they’re largely stable and standardized for the most part."

The reason you need this is not because you see it being extended, it is so that everyone agrees on how it should be entered into the document so that it can be parsed properly and efficiently. "MD5" vs "md5", "sha" vs "SHA-1" vs "sha256" vs "SHA-256"

-
Jason Keirstead
Product Architect, Security Intelligence, IBM Security Systems
www.ibm.com/security | www.securityintelligence.com

Without data, all you are is just another person with an opinion - Unknown


<graycol.gif>"Davidson II, Mark S" ---2015/11/03 08:43:46 AM---My comment is really about controlled vocabularies in general. I tend to have a gut reaction of want

From:
"Davidson II, Mark S" <mdavidson@mitre.org>
To:
"Kirillov, Ivan A." <ikirillov@mitre.org>, Jason Keirstead/CanEast/IBM@IBMCA, John Anderson <janderson@soltra.com>
Cc:
"cti-cybox@lists.oasis-open.org" <cti-cybox@lists.oasis-open.org>
Date:
2015/11/03 08:43 AM
Subject:
RE: [cti-cybox] Re: CybOX 3.0: HashType Refactoring




My comment is really about controlled vocabularies in general. I tend to have a gut reaction of wanting to do away with controlled vocabularies wherever we have them because they are hard for me to implement. That said, I think changing two key factors about controlled vocabularies would change the way I feel about them.


I think we should consider improving controlled vocabularies in these two areas:
          · REQUIRE a specific controlled vocabulary, allow other controlled vocabularies. Right now any vocabulary is just as valid as any other vocabulary, and this makes things more difficult. STIX/CybOX do have the notion of default vocabularies, but this seems like more of a starting point than a definition. If we made a single vocabulary required and all other vocabularies optional (perhaps calling them “third party vocabularies”) I think that would go a long way toward making controlled vocabularies easier to implement.
                  o As a sub-point, I think MTI vocabularies should be specified in the overall spec, as this reduces the number of overall moving parts for people to track.
          · DEFINE the semantics for each value. In many places the meaning of certain controlled vocabularies is unspecified. When I get an indicator with an IndicatorType of “URL Watchlist”, what does that mean exactly? The XSD annotation has some descriptive text (in this case, “Indicator describes a set of suspected malicious URLS”), but it doesn’t tell me how the value of this field changes how I process (or not) the indicator. I can make an inference based on experience, but we should seek to improve these definitions so that implementers have an easier time. If this is just a label and not meant for processing, we should call it out as such.

If controlled vocabularies were to meet the requirements I lay out above, I would have no opinion on whether hashes use a default vocabulary or not. As controlled vocabularies currently stand, my preference is for not using them.


Thank you.
-Mark
          From: cti-cybox@lists.oasis-open.org [mailto:cti-cybox@lists.oasis-open.org] On Behalf Of Kirillov, Ivan A.
          Sent:
          Tuesday, November 03, 2015 7:12 AM
          To:
          Jason Keirstead <Jason.Keirstead@ca.ibm.com>; John Anderson <janderson@soltra.com>
          Cc:
          cti-cybox@lists.oasis-open.org
          Subject:
          Re: [cti-cybox] Re: CybOX 3.0: HashType Refactoring

          I need to read up more on JSON-LD myself, but I think Jason is largely correct, in that we’ll have a data model and a serialization of it that includes the corresponding schemas (likely JSON).


          Going back to the original discussion, I think the broad questions are:
                  1. Do we need a controlled vocabulary around hashing algorithms?
                  2. How should non-standard/esoteric hashes be captured?
          My thoughts are:
          1. No. Controlled vocabularies make the most sense when there is no expectation that what they’re capturing is complete and/or there is a large need for it to be customized by content producers. I don’t think this is the case with cryptographic hashing algorithms; they’re largely stable and standardized for the most part.
          2. I’m a fan of key value pairs for their simplicity, so I think having either a separate type or separate field for the custom hash name (as in my proposal below) is how I would approach it:


          {
          "file" : {
          "hashes" : [
          {
          "hash": "3773a88f65a5e780c8dff9cdc3a056f3",
          "type": ”md5"
          },
          {
          "hash": "f49125dac3:352bb35ffrca2:a123dc4599245",
          "custom_type": "superhash" # A "custom" hash type.
          },
          ]
          }
          }


          Regards,
          Ivan


          From:
          Jason Keirstead
          Date:
          Monday, November 2, 2015 at 1:04 PM
          To:
          John Anderson
          Cc:
          "cti-cybox@lists.oasis-open.org", Ivan Kirillov
          Subject:
          Re: [cti-cybox] Re: CybOX 3.0: HashType Refactoring

          This is not the same thing.... see JSON-Schema for how to validate against a JSON schema (http://json-schema.org/example2.html).

          Namely, you define your schema in a different JSON document. That document can be used to validate any other document. Type information in the content messages are not necessary for validation to a schema, in fact, it's superfluous as the schema defines the type.

          -
          Jason Keirstead
          Product Architect, Security Intelligence, IBM Security Systems

          www.ibm.com/security | www.securityintelligence.com

          Without data, all you are is just another person with an opinion - Unknown


          <graycol.gif>John Anderson ---2015/11/02 01:54:12 PM---"All a developer would do with that @type information at the top level and file level is throw it aw

          From:
          John Anderson <janderson@soltra.com>
          To:
          Jason Keirstead/CanEast/IBM@IBMCA
          Cc:
          "cti-cybox@lists.oasis-open.org" <cti-cybox@lists.oasis-open.org>, "Kirillov, Ivan A." <ikirillov@mitre.org>
          Date:
          2015/11/02 01:54 PM
          Subject:
          Re: [cti-cybox] Re: CybOX 3.0: HashType Refactoring
          Sent by:
          <cti-cybox@lists.oasis-open.org>







          "All a developer would do with that @type information at the top level and file level is throw it away, so it is extra bytes that are not required in the message." That sounds like throwing away all the XML namespace info, too. Sure, you don't need it...if you're not trying to validate against a published schema.

          Is that what you mean?





          From:
          cti-cybox@lists.oasis-open.org <cti-cybox@lists.oasis-open.org> on behalf of Jason Keirstead <Jason.Keirstead@ca.ibm.com>
          Sent:
          Monday, November 2, 2015 12:50 PM
          To:
          John Anderson
          Cc:
          cti-cybox@lists.oasis-open.org; Kirillov, Ivan A.
          Subject:
          Re: [cti-cybox] Re: CybOX 3.0: HashType Refactoring

          I've been using JSON extensively for a very long time, but I don't know anything about JSON-LD, and will not pretend to.

          All I am saying is, having superfluous information in the message should be strongly discouraged.. we need to try to be as terse as possible. A big reason for the move to JSON in the first place was to reduce the message overhead affiliated with XML.

          All a developer would do with that @type information at the top level and file level is throw it away, so it is extra bytes that are not required in the message. Inside the "hash" level obviously it is required and has meaning, and should be present there.

          The root concept is - if the type of an attribute is defined in the specification then there is no reason to have it as part of the message.

          -
          Jason Keirstead
          Product Architect, Security Intelligence, IBM Security Systems

          www.ibm.com/security | www.securityintelligence.com

          Without data, all you are is just another person with an opinion - Unknown


          John Anderson ---2015/11/02 01:45:25 PM---So, that's the question: How does JSON-LD extend vocabularies, and how does that affect the JSON rep


          From:
          John Anderson <janderson@soltra.com>
          To:
          Jason Keirstead/CanEast/IBM@IBMCA
          Cc:
          "cti-cybox@lists.oasis-open.org" <cti-cybox@lists.oasis-open.org>, "Kirillov, Ivan A." <ikirillov@mitre.org>
          Date:
          2015/11/02 01:45 PM
          Subject:
          Re: [cti-cybox] Re: CybOX 3.0: HashType Refactoring
          Sent by:
          <cti-cybox@lists.oasis-open.org>





          So, that's the question: How does JSON-LD extend vocabularies, and how does that affect the JSON representation? How would you express the idea of a custom algorithm hash, Jason?




          From:
          cti-cybox@lists.oasis-open.org <cti-cybox@lists.oasis-open.org> on behalf of Jason Keirstead <Jason.Keirstead@ca.ibm.com>
          Sent:
          Monday, November 2, 2015 12:36 PM
          To:
          John Anderson
          Cc:
          cti-cybox@lists.oasis-open.org; Kirillov, Ivan A.
          Subject:
          Re: [cti-cybox] Re: CybOX 3.0: HashType Refactoring

          I would think the schema will be defined as part of the Cybox 3.0 specification itself, will it not?

          The schema can not change once defined without envisioning the standard. When someone parses that "file" attribute, they will always expect the exact same data structures beneath it.
          -
          Jason Keirstead
          Product Architect, Security Intelligence, IBM Security Systems

          www.ibm.com/security | www.securityintelligence.com

          Without data, all you are is just another person with an opinion - Unknown



          John Anderson ---2015/11/02 01:28:38 PM---Thanks, Jason. I think I understand. Are you saying that the "@context" will define the schema, and


          From:
          John Anderson <janderson@soltra.com>
          To:
          Jason Keirstead/CanEast/IBM@IBMCA
          Cc:
          "Kirillov, Ivan A." <ikirillov@mitre.org>, "cti-cybox@lists.oasis-open.org" <cti-cybox@lists.oasis-open.org>
          Date:
          2015/11/02 01:28 PM
          Subject:
          Re: [cti-cybox] Re: CybOX 3.0: HashType Refactoring





          Thanks, Jason. I think I understand. Are you saying that the "@context" will define the schema, and therefore the schema for all sub-items as well?

          If so, then "superhash" would be an extension to the hash "algorithm" (renamed) vocabulary, courtesy of the "mycybox++" context.

          That would simplify the JSON to this:


          {
          "@context": "
          http://cybox.example.com/mycybox++",
          "@type": "Observable",
          "file" : {
          "hashes" : [
          {
          "hash": "3773a88f65a5e780c8dff9cdc3a056f3",
          "algorithm": "md5" # default type defined in CybOX
          },
          {
          "hash": "f49125dac3:352bb35ffrca2:a123dc4599245",
          "algorithm": "superhash" # new type from my cybox++
          },
          ]
          }
          }


          Two observations:
          1. A context (aka "schema") would be able to extend the vocabulary.
          2. Users who want to use an extended vocabulary would have to create (and share!) a new context, if they want others to understand their objects.

          How is this different from our current situation with custom vocabularies in XML?
          JSA




          From:
          cti-cybox@lists.oasis-open.org <cti-cybox@lists.oasis-open.org> on behalf of Jason Keirstead <Jason.Keirstead@ca.ibm.com>
          Sent:
          Monday, November 2, 2015 11:51 AM
          To:
          John Anderson
          Cc:
          Kirillov, Ivan A.; cti-cybox@lists.oasis-open.org
          Subject:
          Re: [cti-cybox] Re: CybOX 3.0: HashType Refactoring

          Most of those @type sections seem totally superfluous to me.

          IE - I know the object affiliated with the "file" attribute will be a File type. I do not need you to tell me this.

          -
          Jason Keirstead
          Product Architect, Security Intelligence, IBM Security Systems

          www.ibm.com/security | www.securityintelligence.com

          Without data, all you are is just another person with an opinion - Unknown



          John Anderson ---2015/11/02 12:07:40 PM---Ivan, Could some ideas from JSON-LD help us here?


          From:
          John Anderson <janderson@soltra.com>
          To:
          "Kirillov, Ivan A." <ikirillov@mitre.org>, "cti-cybox@lists.oasis-open.org" <cti-cybox@lists.oasis-open.org>
          Date:
          2015/11/02 12:07 PM
          Subject:
          [cti-cybox] Re: CybOX 3.0: HashType Refactoring
          Sent by:
          <cti-cybox@lists.oasis-open.org>





          Ivan,
          Could some ideas from JSON-LD help us here?

          Disclaimer: I'm not sure JSON-LD allows embedding objects like this or extending a context, like I've done.

          Also, there's a "@vocab" thing in JSON-LD. But once we start using vocabularies, we're heading down the road toward Ontologically-Correct Disunity (OCD).

          {
          "@context": "
          http://cybox.example.com/mycybox++",
          "@type": "Observable",
          "file" : {
          "@type": "File",
          "hashes" : [
          {
          "hash": "3773a88f65a5e780c8dff9cdc3a056f3",
          "@type": "md5" # default type defined in CybOX
          },
          {
          "hash": "f49125dac3:352bb35ffrca2:a123dc4599245",
          "@type": "superhash" # new type from my cybox++
          },
          ]
          }
          }

          JSA




          From:
          Kirillov, Ivan A. <ikirillov@mitre.org>
          Sent:
          Monday, November 2, 2015 10:54 AM
          To:
          John Anderson; cti-cybox@lists.oasis-open.org
          Subject:
          Re: CybOX 3.0: HashType Refactoring

          It makes sense, and I can definitely see the parallels to the IP Address refactoring :)

          My main concern is that if the “type” field is intended to capture a set of default hash types and also support custom values, then it will likely need to use a controlled vocabulary, which gets us back to the original HashType implementation and its corresponding complexity:


          {
          "file" : {
          "hashes" : [
          {
          "hash": "3773a88f65a5e780c8dff9cdc3a056f3",
          "type": {"vocabulary":"HashNameVocab-1.0", "value":”md5"}
          },
          {
          "hash": "f49125dac3:352bb35ffrca2:a123dc4599245",
          "type": "superhash" # A "custom" hash type.
          },
          ]
          }
          }


          A possible middle ground is to have the “type” field set to a hard-coded enumeration (with values of “md5”, “sha1”, “sha256” etc.), and have a separate “custom_type” field for custom hash values. This negates the need for a controlled vocabulary driven approach, and thus would still be simpler. I think “custom_type” or “type” would always have to be specified though, as you can’t reliably infer the type of hash from a particular value (although you can make educated guesses – if the value is 16 bytes in length, odds are it’s MD5):


          {
          "file" : {
          "hashes" : [
          {
          "hash": "3773a88f65a5e780c8dff9cdc3a056f3",
          "type": ”md5"
          },
          {
          "hash": "f49125dac3:352bb35ffrca2:a123dc4599245",
          "custom_type": "superhash" # A "custom" hash type.
          },
          ]
          }
          }


          What do you think?

          Regards,
          Ivan


          From:
          John Anderson
          Date:
          Monday, November 2, 2015 at 10:19 AM
          To:
          Ivan Kirillov, "cti-cybox@lists.oasis-open.org"
          Subject:
          Re: CybOX 3.0: HashType Refactoring

          This Hash refactoring seems to parallel the IP Address refactoring. Would it make sense to treat hashes the same way we treat IP Addresses?
          By applying that idea to the example on the page, we get something like this:


          {
          "file" : {
          "hashes" : [
          {
          "hash": "3773a88f65a5e780c8dff9cdc3a056f3",
          "type": "md5"
          },
          {
          "hash": "f49125dac3:352bb35ffrca2:a123dc4599245",
          "type": "superhash" # A "custom" hash type.
          },
          {
          "hash": "12343773a88f65a5e780c8dff9cdc3a0"
          # Default is "md5", if it's not specified.
          }
          ]
          }
          }


          Whadayathink?
          JSA




          From:
          cti-cybox@lists.oasis-open.org <cti-cybox@lists.oasis-open.org> on behalf of Kirillov, Ivan A. <ikirillov@mitre.org>
          Sent:
          Monday, November 2, 2015 10:07 AM
          To:
          cti-cybox@lists.oasis-open.org
          Subject:
          [cti-cybox] CybOX 3.0: HashType Refactoring

          All,

          As I mentioned on last week’s call, we’ve got another proposal related to CybOX 3.0 to get your feedback on:
          https://github.com/CybOXProject/schemas/wiki/CybOX-3.0:-HashType-Refactoring
          <09564010.gif>
                                                                          CybOXProject/schemas
                                                                          schemas - CybOX Schemas and Schema Development

                                                                          Read more...


          This one is around refactoring the way hashes (especially common ones like MD5 and SHA1) are currently captured. Accordingly, we’d love to get your general thoughts on the proposal as well as on the related questions:
                                                                          1. Does it make sense to have two disparate types for capturing hashes in CybOX, one for more common hashes and one for esoteric/custom hashes?
                                                                          2. As far as the list of hashes in the new HashesType – are there any that are missing? Are there any that should be pruned?
                                                                          3. Are there any fields that should be added to the new CustomHashType?
          Regards,
          Ivan and Trey









---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that
generates this mail.  Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php 
[attachment "graycol.gif" deleted by Jason Keirstead/CanEast/IBM] [attachment "ecblank.gif" deleted by Jason Keirstead/CanEast/IBM]


Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]