RE: [cti] RE: [Non-DoD Source] RE: [cti] RE: Versioning Background Docs

Hi Terry,

I missed this response earlier.

Thanks for the feedback. We can go through these when we talk.

-Marlon

From: Terry MacDonald [mailto:terry@soltra.com]
Sent: Monday, March 14, 2016 6:49 PM
To: Taylor, Marlon; Mates, Jeffrey CIV DC3/DCCI; 'Jordan, Bret'; Mark Davidson
Cc: Jason Keirstead; cti@lists.oasis-open.org; Taylor, Marlon
Subject: RE: [cti] RE: [Non-DoD Source] RE: [cti] RE: Versioning Background Docs

Hi Marlon,

I was originally pro-hashed IDs. They seem simple enough, but the restrictions they impose create a huge headache for maintaining relationships over the lifecycle of the objects. A huge headache that was deemed by the community (over extensive discussions) to cause more trouble than they solved.

Jason Keirstead posted earlier my comments made in slack describing why Hashed ID’s are sub-optimal and won’t work, but I’ll outline a slightly deeper discussion below…

Explicit Relationships versus Implicit Relationships

In STIX v1.x we had incremental updates (via implicit relationships) and major updates (via explicit relationships).

Incremental updates related the various versions of an object by the fact that the object ID stays the same – The object ID acts as the ‘key’ that identifies the object, and all updates of that object keep the same object ID. Updated versions of the Object are tracked through the use of a separate version control field. The versions are related implicitly.

Major updates related the various versions of an object by explicitly creating a relationship between the old object and the new object. The object ID changes between versions of objects, so the only way to show that the new object is an updated version of the old object is to create a brand new relationship object to join the old and new versions of the object together.

Hashed Object ID and the knock-on effects

Hashed object ID’s mean that the Object ID is based on the hash computed from the contents of the object itself. This has a few really cool benefits:

· No-one can change the Object contents, as the recipient will be able to check the content against the object ID and will know if it’s been tampered with

· This enforces immutability

But it also has massive downsides:

· We force explicit relationships because we cannot perform incremental updates with hashed IDs

· We force every single TAXII server to always track every version of every object. (Implicit relationships don’t require this).

· Now all other relationships sent previously are pointing to the wrong version of the object. The relationships will need to all be republished by the relationship object creators, or we force all relationships to be transitive, and instead require every consumer implementation to always walk the hierarchy of version updates every time they wish to follow the list of explicit relationships.

· We now have to send at least one relationship object and the new object every single time we do an update

The alternative: Randomly created Object IDs and Incremental Updates

The implicit versioning scheme, where the object always maintains its Object ID, and we update that one object makes things a LOT simpler. Updates are done simply be the object creator publishing an object with the same Object ID as the previous version of the Object, and simply increasing the version number in some way (we’re still discussing how). This has many benefits:

· It’s simple to understand (big plus for new users)

· All relationships stay valid when the object is updated

· TAXII Servers are free to only keep the last version of the object or to keep all the versions because the relationships remain valid in either case.

· There is no ‘walking the relationship hierarchy’ to find all the relationships.

· We are free to use which ever cryptographic solution we wish to for signing the objects to authenticate the objects.

There is one downside as far as I can tell:

· Anyone can change the object contents (which we can stop by adding an HMAC when we do cryptography in a later version)

My vote is for Incremental Versioning and Implicit relationships.

There should only be one way to version the one object. It should always keep the same ID during it's lifecycle, and should have a version field (or some other field that changes per version) that tracks which version the object is.

We had two ways to version in STIX v1.x (i.e. major updates or incremental) and it didn't work.

Cheers

Terry MacDonald

Senior STIX Subject Matter Expert

SOLTRA | An FS-ISAC and DTCC Company

+61 (407) 203 206 | terry@soltra.com

From: cti@lists.oasis-open.org [mailto:cti@lists.oasis-open.org] On Behalf Of Taylor, Marlon
Sent: Tuesday, 15 March 2016 4:00 AM
To: Mates, Jeffrey CIV DC3/DCCI <Jeffrey.Mates@dc3.mil>; 'Jordan, Bret' <bret.jordan@bluecoat.com>; Mark Davidson <mdavidson@soltra.com>
Cc: Jason Keirstead <Jason.Keirstead@ca.ibm.com>; cti@lists.oasis-open.org; marlon.taylor@us-cert.gov
Subject: RE: [cti] RE: [Non-DoD Source] RE: [cti] RE: Versioning Background Docs

Some for any confusion. I do remember the TC talking about GUIDs and HASHes but I don't remember the use-cases/scenarios/etc. which lead the TC go against HASH based. Are they available for TC consideration in this discussion?

I'm unaware of any conceptual workflows/use-cases that will break with insurmountable repair due to this change. I think it's more important for the TC to be able review any rationale that was to make its decisions so we as a whole can quickly reference and re-evaluate its stand on any decision.

Why are HASH IDs bad?

-Marlon

From: cti@lists.oasis-open.org on behalf of Mates, Jeffrey CIV DC3/DCCI
Sent: Monday, March 14, 2016 12:43:46 PM
To: 'Jordan, Bret'; Mark Davidson
Cc: Jason Keirstead; Taylor, Marlon; cti@lists.oasis-open.org; marlon.taylor@us-cert.gov
Subject: RE: [cti] RE: [Non-DoD Source] RE: [cti] RE: Versioning Background Docs

I certainly understand concerns about deterministic IDs breaking workflows and not working in a number of potential use cases. It might make sense to simply allow IDs to follow the UUID v4 and UUID v5 specs. That way organizations that want to use deterministic IDs can, while those that don't have no need to. Ultimately because of how the UUID spec works out both will have the same length, and an outside observer will only notice a single character change between the two.

From a parsing standpoint handling something like xxxxxxxx-xxxx-4xxx-xxxx-xxxxxxxxxxxx instead of xxxxxxxx-xxxx-5xxx-xxxx-xxxxxxxxxxxx is pretty trivial as both will accomplish the same thing.

Jeffrey Mates, Civ DC3/DCCI
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Computer Scientist
Defense Cyber Crime Institute
jeffrey.mates@dc3.mil
410-694-4335

-----Original Message-----
From: cti@lists.oasis-open.org [mailto:cti@lists.oasis-open.org] On Behalf Of Jordan, Bret
Sent: Monday, March 14, 2016 12:09 PM
To: Mark Davidson
Cc: Mates, Jeffrey CIV DC3/DCCI; Jason Keirstead; Taylor, Marlon; cti@lists.oasis-open.org; marlon.taylor@us-cert.gov
Subject: Re: [cti] RE: [Non-DoD Source] RE: [cti] RE: Versioning Background Docs

And for further clarification and to support Trey's statements. This TC talked about deterministic IDs at great length and it was decided that we would not go down that path. With Mark, I believe we have strong consensus to stick with the current ID patterns we have. If this is not the case, then we will need to take this to a ballot. Things like IDs are fundamental and we need to figure these out before we do anything else. Thus the reason we had this discussion a few months ago.

Deterministic IDs may offer interesting use cases but also run the risk of breaking a lot of workflow that we are now building.

Thanks,

Bret

Bret Jordan CISSP
Director of Security Architecture and Standards | Office of the CTO Blue Coat Systems PGP Fingerprint: 63B4 FC53 680A 6B7D 1447 F2C0 74F8 ACAE 7415 0050 "Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg."

        On Mar 14, 2016, at 10:03, Mark Davidson <mdavidson@soltra.com> wrote:

        Jeff,

        Can you help me understand your perspective? In STIX 1.x, versioning was handled using the timestamp field (and would seem to align with your post, unless I’m mis-reading it) but I’m not sure I’ve seen any discussion about using timestamp for versioning in 2.0. Are you proposing that we use timestamps for versioning in 2.0, or am I misunderstanding your comment?

        Thank you.
        -Mark



        On 3/14/16, 11:52 AM, "Mates, Jeffrey CIV DC3/DCCI" <cti@lists.oasis-open.org on behalf of Jeffrey.Mates@dc3.mil> wrote:



                My understanding is that in general versioning should be handled using the
                CTI Core "created_at" attribute which exists on both objects and
                relationships. If this changes any object with a deterministic hash would
                also have its GUID change. As such different versions of an object would
                respect each other's unique GUIDs thus protecting referential integrity.

                Even without a deterministic hash this would still be possible by simply
                generating a new GUID every time a new version of an object or relationship
                is produced.

                Jeffrey Mates, Civ DC3/DCCI
                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                Computer Scientist
                Defense Cyber Crime Institute
                jeffrey.mates@dc3.mil
                410-694-4335


                -----Original Message-----
                From: cti@lists.oasis-open.org [mailto:cti@lists.oasis-open.org] On Behalf
                Of Jason Keirstead
                Sent: Monday, March 14, 2016 11:27 AM
                To: Taylor, Marlon
                Cc: cti@lists.oasis-open.org; Mates, Jeffrey CIV DC3/DCCI;
                marlon.taylor@us-cert.gov
                Subject: [Non-DoD Source] RE: [cti] RE: Versioning Background Docs

                Are you saying that versions will only exist on relationship objects? How
                will that help me figure out if a given threat actor's description is the
                most recent.


                -
                Jason Keirstead
                STSM, Product Architect, Security Intelligence, IBM Security Systems
                www.ibm.com/security | www.securityintelligence.com

                Without data, all you are is just another person with an opinion - Unknown


                Inactive hide details for "Taylor, Marlon" ---03/14/2016 12:07:46
                PM---Correct. Hashing won't provide that capability. Relation"Taylor,
                Marlon" ---03/14/2016 12:07:46 PM---Correct. Hashing won't provide that
                capability. Relationships will provide what you're looking for.

                From: "Taylor, Marlon" <Marlon.Taylor@hq.dhs.gov>
                To: Jason Keirstead/CanEast/IBM@IBMCA
                Cc: "Mates, Jeffrey CIV DC3/DCCI" <Jeffrey.Mates@dc3.mil>,
                "cti@lists.oasis-open.org" <cti@lists.oasis-open.org>,
                "marlon.taylor@us-cert.gov" <marlon.taylor@us-cert.gov>
                Date: 03/14/2016 12:07 PM
                Subject: RE: [cti] RE: Versioning Background Docs

                ________________________________




                Correct. Hashing won't provide that capability.

                Relationships will provide what you're looking for.

                -Marlon



                ________________________________

                From: Jason Keirstead
                Sent: Monday, March 14, 2016 10:56:04 AM
                To: Taylor, Marlon
                Cc: Mates, Jeffrey CIV DC3/DCCI; cti@lists.oasis-open.org;
                marlon.taylor@us-cert.gov
                Subject: RE: [cti] RE: Versioning Background Docs


                Apologize for my confusion but I don't really understand what is being
                discussed in this thread.

                Are people talking about IDs or Versions? What does hashing have to do with
                versioning?

                I (hope?) people are not advocating to simply hash the contents of the
                object and use that as a version? That is not workable. A version has to be
                continually incrementing. I need to be able to look at a version and know if
                it is the latest version or if it is stale. You can't do that with hashes.

                -
                Jason Keirstead
                STSM, Product Architect, Security Intelligence, IBM Security Systems
                www.ibm.com/security | www.securityintelligence.com

                Without data, all you are is just another person with an opinion - Unknown


                Inactive hide details for "Taylor, Marlon" ---03/14/2016 11:42:28 AM---Hi
                All, Jeff and I spoke offline and we are in agreement"Taylor, Marlon"
                ---03/14/2016 11:42:28 AM---Hi All, Jeff and I spoke offline and we are in
                agreement with the hash based approach. Some takeaway

                From: "Taylor, Marlon" <Marlon.Taylor@hq.dhs.gov>
                To: "Mates, Jeffrey CIV DC3/DCCI" <Jeffrey.Mates@dc3.mil>,
                "cti@lists.oasis-open.org" <cti@lists.oasis-open.org>
                Cc: "marlon.taylor@us-cert.gov" <marlon.taylor@us-cert.gov>
                Date: 03/14/2016 11:42 AM
                Subject: RE: [cti] RE: Versioning Background Docs Sent by:
                <cti@lists.oasis-open.org>

                ________________________________




                Hi All,

                Jeff and I spoke offline and we are in agreement with the hash based
                approach. Some takeaways:
                - cleared up "shallowness" of shallow objects
                - conveyed the idea of relationships which contain arrays of ids (he calls
                them link aggregators)

                As we finalize objects across the TC we can go into object-specific required
                fields. Ex: should every Indicator have an observable?

                Keep up the feedback.

                -Marlon



                ________________________________

cti message