RE: [cti] Finalizing the STIX 2.1 Malware Object

Thanks for putting this together. I don’t think it quite captured what Gary and I had proposed however as we actually used single “ref” relationships to observed data elements rather than “refs”. This means that we can capture the entire static or dynamic analysis result as a single graph.

While it is still a bit larger than the other files 8,175 bytes vs the current 5,006 bytes. It is a fair bit smaller than the original version at 11,936 bytes while still following existing rules and allowing for a potentially greater degree of fidelity.

I’ve attached an example of what this looks like, although I did sort of cheat by throwing in an “action_extension” for the file object as I couldn’t find any way to say that one file created another. I also included an image showing how this functions as a graph.

If you received this already, I apologize. I’ve been having email issues today.

Jeffrey Mates, Civ DC3/DCCI

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Computer Scientist

Defense Cyber Crime Institute

jeffrey.mates@dc3.mil

410-694-4335

From: cti@lists.oasis-open.org <cti@lists.oasis-open.org> On Behalf Of Kirillov, Ivan A.
Sent: Friday, June 29, 2018 11:26 AM
To: cti@lists.oasis-open.org
Subject: [Non-DoD Source] Re: [cti] Finalizing the STIX 2.1 Malware Object

To continue the discussion on the capture of Cyber Observables as part of the Malware SDO, I’ve attached 3 example JSON instances outlining the various approaches that we’ve looked at:

current: the approach currently defined in the STIX 2.1 specification, using observable-objects dictionaries nested as dictionary values.

Positives: simplifies parsing, as objects are always embedded, so no need for dereferencing.
Negatives: complicated data model/specification (some values are object dictionaries, some are not), different design pattern than used elsewhere.

top_level_observables: the approach I had proposed on the June 19 working call of embedding all observables in a property at the top level of the Malware SDO, and then referencing them elsewhere via their ID.

Positives: allows object re-use.
Negatives: makes parsing more difficult, different design pattern than used elsewhere.

observed_data: the approach that Gary Katz and Jeff Mates presented on the June 26^th working call of capturing all observables in Observed Data SDOs that are referenced accordingly.

Positives: re-uses existing object (Observed Data), which results in less effort on the part of consumers and producers to use since they already support it.
Negatives: Observed Data contains other required properties which may not be suitable for this context (number_observed, etc.), results in significantly larger JSON representations.

As you’ll see, one of the downsides of the Observed Data-based approach is that each individual Cyber Observable object (file, software, et.) has to be captured in its own Observed Data SDO per the current language in the specification (i.e., a single Observed Data cannot capture multiple unrelated objects). This means that this approach will significantly increase the size of the JSON that we’ll need to generate for Malware SDOs that make use of many cyber observables. The other issue with using Observed Data here is that “first_observed/last_observed” and “number_observed” are rather meaningless here, since these are non-traditional observations; in my example, I set “first_observed/last_observed” to the same timestamp as “created/modified” and “number_observed” I always set to 1.

Another option we discussed briefly at the June 26^th call was to create a new “Observed-data like” SDO that could be capture multiple objects and be better suited for use cases such as these. It seems like this would essentially be identical to the Observed Data SDO but without the first_observed/last_observed/number_observed properties.

Let me know your thoughts and preferences as far as these approaches – personally I’m rather torn, as I don’t see a clear winner here. Also, since this issue is currently holding up the release of STIX 2.1 CSD01 with no immediate resolution in sight, I think we need to seriously consider whether we should include these Malware SDO updates in CSD01 or instead push them out to CSD02.

Regards,

Ivan

From: <cti@lists.oasis-open.org> on behalf of Ivan Kirillov <ikirillov@mitre.org>
Date: Tuesday, June 19, 2018 at 1:02 PM
To: Sean Barnum <sean.barnum@FireEye.com>, Bret Jordan <Bret_Jordan@symantec.com>, "cti@lists.oasis-open.org" <cti@lists.oasis-open.org>
Subject: Re: [cti] Re: [EXT] Re: [cti] Finalizing the STIX 2.1 Malware Object

That seems reasonable to me – I’ll bring it up on the working call. Thanks!

-Ivan

From: Sean Barnum <sean.barnum@FireEye.com>
Date: Tuesday, June 19, 2018 at 12:54 PM
To: Ivan Kirillov <ikirillov@mitre.org>, Bret Jordan <Bret_Jordan@symantec.com>, "cti@lists.oasis-open.org" <cti@lists.oasis-open.org>
Subject: Re: [cti] Re: [EXT] Re: [cti] Finalizing the STIX 2.1 Malware Object

Yes, that is basically what I am proposing.

Something along the lines of:

result (required)

string

The classification result or name assigned to the malware instance by the AV scanner tool.

If no resulting context-specific classification value or name is provided by the AV scanner tool then

the result SHOULD come from the av-result-general-ov open vocabulary.

where av-result-general-ov is something like “malicious”, “suspicious”, “benign”, “unknown”, “error”

Sean Barnum

Principal Architect

FireEye

M: 703.473.8262

E: sean.barnum@fireeye.com

From: "Kirillov, Ivan A." <ikirillov@mitre.org>
Date: Tuesday, June 19, 2018 at 2:36 PM
To: Sean Barnum <sean.barnum@FireEye.com>, Bret Jordan <Bret_Jordan@symantec.com>, "cti@lists.oasis-open.org" <cti@lists.oasis-open.org>
Subject: Re: [cti] Re: [EXT] Re: [cti] Finalizing the STIX 2.1 Malware Object

Thanks Sean - no worries about the delayed reply. So as far as 2), are you suggesting that we make “results” required and that it can capture either the actual result or something more generic (e.g., malicious/benign/etc.) that could come from a vocabulary? I do agree with you that the current language around “results” being not required if there is no result is rather confusing and I would also rather make it required in all cases.

Regards,

Ivan

From: Sean Barnum <sean.barnum@FireEye.com>
Date: Tuesday, June 19, 2018 at 8:35 AM
To: Bret Jordan <Bret_Jordan@symantec.com>, Ivan Kirillov <ikirillov@mitre.org>, "cti@lists.oasis-open.org" <cti@lists.oasis-open.org>
Subject: Re: [cti] Re: [EXT] Re: [cti] Finalizing the STIX 2.1 Malware Object

Sorry for the delayed response, Ivan.

This week I am actually in the midst of working through some significant evolution on our Malware object and its use.

I plan to attend today’s working call but am not sure what level of definitive opinions I will be ready to offer by then on very specific details. If not on today’s call we still should hopefully be able to offer some constructive input this week.

On your two items that started this thread I can offer the following though:

FireEye would definitely support consolidating the _expression_ of the observables into a single location and referencing those from the various other places as appropriate. This is MUCH cleaner, simpler and more resilient.
I would agree that it makes sense to make “product” required as it does not really make sense to capture/convey an av_results entry where you don’t convey which product was used. I would disagree with making “scanned” required. There are not uncommon use cases where you may wish to convey that samples were scanned with particular AV but specifically do not want to expose when those scans occurred as it exposes details of when you knew about them. Lastly, I would suggest that we modify the current definition of “result” slightly and make it required. Currently, the definition allows the lack of the “results” property to imply that the scan was successfully completed but did not classify the sample as malicious. This sort of implication seems to present significant risk of confusion. Rather, I propose that the “result” property be defined to explicitly convey the result of the scan (whether malicious or otherwise) and that it be required. We could also define a simple vocab for general results that could apply across any scanners (e.g. “malicious”, “suspicious”, “benign”, “unknown”, “error”, etc). Looking across the full set of properties currently in av-results-type, the two properties that seem to be necessary (any av-results instance would not really make any sense or be of value without them) are product and results. Telling people that a scan occurred (even if all the other details are included) but not saying which product was used is not very useful. The “scan” could have been my 5 year old niece looking at the file. Similarly, telling people that the sample was scanned (even if all the other details are included) but not saying the result of the scan is not very useful.

Sean Barnum

Principal Architect

FireEye

M: 703.473.8262

E: sean.barnum@fireeye.com

From: <cti@lists.oasis-open.org> on behalf of Bret Jordan <Bret_Jordan@symantec.com>
Date: Monday, June 18, 2018 at 5:04 PM
To: "Kirillov, Ivan A." <ikirillov@mitre.org>, "cti@lists.oasis-open.org" <cti@lists.oasis-open.org>
Subject: [cti] Re: [EXT] Re: [cti] Finalizing the STIX 2.1 Malware Object

I will try and review this change this week.

Bret

From: cti@lists.oasis-open.org <cti@lists.oasis-open.org> on behalf of Kirillov, Ivan A. <ikirillov@mitre.org>
Sent: Monday, June 18, 2018 10:26:15 AM
To: cti@lists.oasis-open.org
Subject: [EXT] Re: [cti] Finalizing the STIX 2.1 Malware Object

Are there any other thoughts on these topics? It would be great to close them out so we can finish up CSD01 of STIX 2.1.

Regards,

Ivan

From: <cti@lists.oasis-open.org> on behalf of Ivan Kirillov <ikirillov@mitre.org>
Date: Wednesday, June 13, 2018 at 2:47 PM
To: Allan Thomson <athomson@lookingglasscyber.com>, "cti@lists.oasis-open.org" <cti@lists.oasis-open.org>
Cc: "Kumar, Subodh" <subodh.kumar@jpmorgan.com>
Subject: Re: [cti] Finalizing the STIX 2.1 Malware Object

Sorry, that should read “Conversely, parsing the SDO may become more difficult because…”

Regards,

Ivan

From: Ivan Kirillov <ikirillov@mitre.org>
Date: Wednesday, June 13, 2018 at 2:44 PM
To: Allan Thomson <athomson@lookingglasscyber.com>, "cti@lists.oasis-open.org" <cti@lists.oasis-open.org>
Cc: "Kumar, Subodh" <subodh.kumar@jpmorgan.com>
Subject: Re: [cti] Finalizing the STIX 2.1 Malware Object

Hi Allan,

This approach doesn’t fundamentally change how we capture static/dynamic analysis data, but rather where and how the Cyber Observable Objects that correspond to that data are stored. If you have multiple observables from different analyses, you’ll just reference their corresponding objects that are stored in the “observable_objects” dictionary (which may or may not be the same objects across different analyses).

As far as being easier, it’s kind of a wash – it may simplify the generation of content because any Cyber Observable Objects would have to be stored in this top-level dictionary. Conversely, parsing the SDO because you’ll have to dereference the objects as you come across their usage. However, I do think that the simplification to the data model and the ability to re-use objects are worthwhile changes.

Regards,

Ivan

From: Allan Thomson <athomson@lookingglasscyber.com>
Date: Wednesday, June 13, 2018 at 2:22 PM
To: Ivan Kirillov <ikirillov@mitre.org>, "cti@lists.oasis-open.org" <cti@lists.oasis-open.org>
Cc: "Kumar, Subodh" <subodh.kumar@jpmorgan.com>
Subject: Re: [cti] Finalizing the STIX 2.1 Malware Object

Ivan – regarding 1.

What if I have multiple observables for the same malware from different analysis (i.e. static + dynamic results).

Would consolidating them into a single place really make it easier? You would still want to indicate that you have a list of observables and indicate where those were ‘observed’ from either static or dynamic or other.

So I’m not sure consolidating it makes it easier but so long as the same things are possible with the consolidated design then I don’t have a strong preference either way.

Allan Thomson

CTO (+1-408-331-6646)

LookingGlass Cyber Solutions

From: "cti@lists.oasis-open.org" <cti@lists.oasis-open.org> on behalf of "Kirillov, Ivan" <ikirillov@mitre.org>
Date: Wednesday, June 13, 2018 at 12:57 PM
To: "cti@lists.oasis-open.org" <cti@lists.oasis-open.org>
Cc: "Kumar, Subodh" <subodh.kumar@jpmorgan.com>
Subject: [cti] Finalizing the STIX 2.1 Malware Object

All,

As we’re wrapping up work on STIX 2.1 CSD01, we need to finalize what we have for the updated Malware SDO. Accordingly, I have two topics I’d like to bring up in this regard:

Thanks to the work done by Subodh Kumar and his team, I’ve been wondering if there’s a better way to capture the Observable Objects associated with the Malware SDO. Right now, there are three places where you may encounter a Cyber Observable Object: samples (a dict of observable objects), static_analysis_results/results (certain keys have a corresponding dict of observable objects), dynamic_analysis_results/results (each key has a corresponding dict of observable objects).

Instead of having these observable object dictionaries all over the place, I believe it would make more sense to have a single property at the top level of the object (let’s call it “observable_objects”), where any Cyber Observable Objects associated with the SDO (samples, analysis results, etc.) could be captured, via references. There are a number of advantages to this: a simpler data model (less embedded observable object dicts everywhere), the ability to re-use objects (e.g., if static and dynamic analysis find the same objects, you can create one object and just reference it accordingly), and a more compact serialization. See the attached JSON example for what this looks like in practice – this is a modified version of the “Malware Instance with Analysis Data” example currently in the 2.1 spec.

Currently, the “av-results-type”, used to capture AV classification results, has only optional properties and the text specifies that at least one must be included. This allows you to construct some odd, but spec-valid instances, such as an AV classification with only the engine version. In order to make this type more useful, I’d suggest that we make “product” (the name of the tool performing the scan) and “scanned” (the date/time the scan occurred) required, so that way you’ll at least have this minimum set of useful data for each instance. In addition, we should probably add some text stating that the “result” property (the actual AV classification result, e.g., “Trojan.Zeus”) must be included if the tool reports some classification during the scan.

Let me know what you think – if we can get these final things wrapped up, we’re that much closer to getting STIX 2.1 out the door.

Regards,

Ivan

This email and any attachments thereto may contain private, confidential, and/or privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments thereto) by others is strictly prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete the original and any copies of this email and any attachments thereto.

cti message