|I agree with Allan's statements completely. But I think the scale we need to think about is a few orders of magnitude bigger, thus the problem with option 2 is even worse. |
Bret Jordan CISSP
Director of Security Architecture and Standards | Office of the CTO
Blue Coat Systems
PGP Fingerprint: 63B4 FC53 680A 6B7D 1447 F2C0 74F8 ACAE 7415 0050
"Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg."
---------------------------------------------------------------------To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail. Follow this link to all your TCs in OASIS at:https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php Option #1 is preferred for the following reasons: a) Efficiency (Size & Processing Load) - Example: 3 million observations to share that are defining hashes or IPs or domains or any other IOC with Option #1 we have 3 million objects vs Option #2 we have 6 million objects. - With Option #2: To load the 3 million observables with 3 million Cybox definition objects I have to store all objects and then connect the relationships for the 3 million observations to the 3 million cybox containers and do the lookup for those objects 3 million times. INSTEAD with Option #1 I just have an object and store it in the db. Done. - Most of the time/use cases that we (lg) care about, the Cybox pattern or definition is a statement that only makes sense within the object it is contained within. It does not make sense as a TLO that is referred to by other objects itself. If it becomes a TLO itself then I now have to maintain TLO lifecycle for that object and all of the associated memory and processing to handle that object. - If there is a pattern that represents an indicator or observation that is shared then the indicator or observation objects are the objects that are referred to not the cybox object itself. One of the topics we talked about at the F2F in DC was the question of how CybOX gets included in STIX. There was a very long discussion with a couple different options, but the two primary options were these: 1. CybOX has a container (defined by CybOX) that gets directly embedded in STIX TLOs as a field. So Observation and Asset would have a field called “cybox” (or something), that was a CybOX container. I’ll call this approach “embedded cybox”. 2. CybOX has a container (defined by CybOX) that gets included as a STIX TLO itself. Things like observations and assets reference that “cybox-container” TLO by ID for their CybOX content. I’ll call this approach “referenced cybox”. There was decent consensus at the F2F was that #2 was the preferred approach (we took an informal straw poll). The thought was that it would allow people to reference existing CybOX containers rather than having to duplicate content. F2F consensus is not SC consensus though: not everyone was at the F2F and we need to reconfirm things more broadly before moving forward. To slightly complicate things, since the F2F, a decent chunk of people on Slack and in person have been getting consensus on Slack for the other approach (#1). The feeling is that the referenced approach is overly verbose (an observation is 2 TLOs rather than 1) and containers are in reality not going to be referenced enough to make it worth it. Either way, I wanted to have us make a clear decision on the full list. So, to show you an example, here’s an Observation via approach #2 (F2F consensus): "id": "observation--b67d30ff-02ac-498a-92f9-32f845f448cf", "created_time": "2016-04-06T19:58:16Z", "created_by_ref": "source--f431f809-377b-45e0-aa1c-6a4751cae5ff", "start": "2015-12-21T19:00:00Z", "end": "2015-12-21T19:00:00Z", "cybox_container_ref": "cybox-container--e7130ab5-8c91-4660-8d89-033b1a2fc280" "type": "cybox-container", "id": "cybox-container--e7130ab5-8c91-4660-8d89-033b1a2fc280", "created_time": "2016-04-06T19:58:16Z", "created_by_ref": "source--f431f809-377b-45e0-aa1c-6a4751cae5ff", "file_name": "malware.exe", "md5": "3773a88f65a5e780c8dff9cdc3a056f3", "sha1": "cac35ec206d868b7d7cb0b55f31d9425b075082b" Note how it’s two TLOs to report the file observation, but you could reference that cybox-container from a different observation if you wanted to. On the other hand, option #1 (recent consensus on Slack): "id": "observation--b67d30ff-02ac-498a-92f9-32f845f448cf", "created_time": "2016-04-06T19:58:16Z", "created_by_ref": "source--f431f809-377b-45e0-aa1c-6a4751cae5ff", "start": "2015-12-21T19:00:00Z", "end": "2015-12-21T19:00:00Z", "file_name": "malware.exe", "md5": "3773a88f65a5e780c8dff9cdc3a056f3", "sha1": "cac35ec206d868b7d7cb0b55f31d9425b075082b" While you can’t reference the CybOX from another observation, it’s only one TLO to report the observation and you don’t need to resolve an ID reference. IMO it comes down to whether we see these CybOX containers commonly being reused by sensors and repositories across different producers. If they will be, then the referenced approach probably makes sense. If they won’t be (sensors and other tools generate CybOX from scratch for each observation rather than looking up existing content in a database) then the embedded approach is probably better. So, what do you think, and why? Personally, I like approach #1 (as I did at the F2F). I don’t think that tools will in reality re-use CybOX content (especially sensors) and so the extra overhead of having to look up those cybox-containers by ID (as well as the extra conceptual complexity for people to understand this approach) is not worth the tradeoff. I also feel like Observation is probably the STIX TLO that will be passed around the most at scale and so having to resolve that ID reference to point to the container is a high cost (unlike, for example, Campaign, which will be at a much lower scale).