RE: [Non-DoD Source] [cti-stix] Eight Arguments for an Infrastructure SD

Jane,

Sorry for taking a long time to respond. Since it seemed you put considerable amount of time into your email, I wanted to make sure I had time to put some thought into the problem before providing a response. I agree that we need a way to represent malicious infrastructure within 2.1 although I believe there are multiple ways we can achieve this goal.

There are three options that I believe are possible

(4 if you include not doing anything about infrastructure in 2.1, which I do not believe is an option)

Option 1: Create a New Infrastructure Object

Option 2: Modify the Observed Data TLO to also model Infrastructure

Option 3: Make Cyber Observables Top Level Objects

There are Pros and Cons to each of these, and I’d be interested in hearing everyone’s inputs. My personal order of preference is Option 3, then Option 2, then Option 1. My personnel opinion is we should not release 2.1 without a way to handle Infrastructure.

I think some of you will find it ironic that I would push for option 3, since I was one of those that fought against making cyber observables top level objects, but the reasons for keeping them separated no longer exist. Cybox has been integrated into STIX, there is not a reason in my mind to keep them non-TLOs any longer. The main disadvantage to making Cyber Observables TLOs is that it is a major breaking change from STIX 2.0. We can debate how big a deal this is, but I’d rather rip the bandaid off now rather than having this issue long term.

From a model standpoint, embedding cyber observables within Observed Data results in graphs within graphs, which are in themselves difficult to deal for systems receiving STIX data. Making Cyber Observables TLOs would fix this issue and simplify the model.

From an analyst standpoint. I spent (parts of) the last several days manually ingesting text documents into our own cyber analytic system. The model we use is a graph and is based closely on the STIX model. I modeled a custom malware analysis report, an internal incident/event report, a short external government finished intelligence reporting of an event, and a longer external government intelligence reporting of multiple events. I manually ingested a variety of reports to form my own options about how I think about the data and would like to model it, what is useful and what information cluttered the graph.

In the end, it did not matter to me if an object represented an IP address that was seen in a malware object or it was observed on my network or it was infrastructure found through other methods. That context was contained in the linkages. An IP address, is an IP address, is an IP address and storing it within Observed Data or an Infrastructure object did not make too much sense to me. The fact that the IP was infrastructure used by the adversary was shown in my link types, not in the object itself. For example, some IP address may relate to adversary infrastructure because the adversary had taken over the system for a time and then at a later time another adversary started using it. It was infrastructure for both adversaries at different times, which is shown through the links and the time attached to those links, not through the IP address itself.

If we decide not to make Cyber Observables TLOs, my second option would be to extend the Observed Data TLO to meet the Infrastructure needs. This would mean vastly increasing the number of relationship types assigned to Observed Data and adding on some extra properties. One could be a Boolean stating whether the Cyber Observables were observed on your network or some other fashion. By not creating an Infrastructure object, we avoid having one organization model the data using the Observed Data object and another organization use the Infrastructure object or the worst case scenario where an analyst feels they need to model it as Observed Data and Infrastructure, leading to them being extremely frustrated with us and receiving analysts seeing 4 objects in their system for every IP (one Observed Data object, one embedded IP address object in the OD, one Infrastructure Object and one IP address object in the Infrastructure object (yes some systems could optimize how this data is viewed, but some will not)). I understand that we will be bastardizing the Observed Data object by doing this, but that’s another reason why making Cyber Observables TLOs is a better option in my view.

Lastly, I would suggest an Infrastructure object. It would be similar to an Observed Data object but with more relationships associated with it. This would still allow us to complete the initial STIX model and provide the functionality that many users would expect to see within the model.

I agree with Bret, Jane and many others on the crucial need to model infrastructure, but I wanted to provide some additional options for how we can meet that need.

Apologies for the long email,

-Gary

From: cti-stix@lists.oasis-open.org [mailto:cti-stix@lists.oasis-open.org] On Behalf Of JG on CTI-TC
Sent: Tuesday, November 7, 2017 7:19 PM
To: cti-stix@lists.oasis-open.org
Subject: [Non-DoD Source] [cti-stix] Eight Arguments for an Infrastructure SDO for STIX 2.1

At present the CTI TC does not appear to be of one mind on the need for an Infrastructure SDO for 2.1. After months of debate in Slack and on the email list, after two intense working sessions at both the Bethesda, MD and Austin, TX face-to-face meetings, and after numerous discussions during working meetings and review of the working draft developed by Richard Struse and Bret Jordan, a Straw Man poll at Austin led to an almost even tie on whether or not to include an Infrastructure SDO in 2.1. I'm writing today to outline eight reasons why I believe we should seriously consider including an Infrastructure SDO for the STIX 2.1 release. Note that my view on the topic is as a threat hunter, educator, and analyst; therefore, I'll be relying on insights from the programmers, data architects, and MRTI aficionados to actually make it work. It will make the human-to-machine interface more effective during this period of rapid ecosystem expansion, ISAO/ISAC build-out, market/product definition and trust-building between private sector entities and law enforcement for critical infrastructure protection.

1. An argument has been made that the Indicator SDO could serve as a series of interconnected buckets for a malicious infrastructure, and that specific Cyber Observables could be linked to such Indicator to define a malicious infrastructure with a Boolean property indicating the goodness or badness of a set of interconnected Indicators. I believe this would not be a suitable approach for the following reasons: 1) it would overload the Indicator SDO which already suffers from overuse and misunderstanding; 2) relationships would have to be drawn through the Observed Data SDO to the specific Cyber Observables. Given how timestamps are handled this would add a layer of complexity that we could avoid with carefully designed properties on the Infrastructure SDO.

2. A wide range of SDOs and Cyber Observables will need to be strung together in an interrelated complex of potentially rapidly changing data elements by producers seeking to convey rich detail about observations, sightings, TTPs, malware, network effects, and cyber observables operating as a single unified entity with a single purpose. Once issued by a Producer, sightings of one or more SDOs or cyber observables associated with this multi-headed Hydra will enable other members of a sharing community to quickly assess kill chain phases or other clues on their own networks that may help expedite discovery. And when operating within a truly effective and skilled sharing community this could also lead to more rapid crowdsourced threat analysis with accompanying remediation recommendations.

3. Foundational literature on tradecraft in cyber threat analysis includes an Infrastructure vertex as part of the analytical toolset. I'm referring here to the Diamond Model (Caltagirone, 2013) which directly juxtaposes threat actor capabilities to the infrastructure he uses. The origins and utility of the Diamond Model within the analyst community stands on its own merits, regardless of the fact that the STIX2 data model has moved on from this foundational concept.

4. Advanced NoSQL graph database techniques are well suited to visualizing the interconnectedness of a malicious infrastructure expediting pattern recognition by human analysts seeking to perform higher level analysis and synthesis of STIX2 data. The power of this type of tooling should not be underestimated as we look towards the future of CTI and sharing communities. Indeed, such notable companies as OpenDNS (acquired by OASIS member CISCO) have used such visualizations to great success. Further, the use of data visualization techniques for enabling higher-order pattern recognition as a tool for analysis has been well documented by Tufte (2001), among others. Importantly, we need to build the Infrastructure SDO with sufficient metadata properties to enable these higher-order analytics. For example, it will be important to link back to Threat Actor SDOs within a boxed time-frame to move closer to attribution.

5. The larger global community of network defenders and cyber threat analysts are developing siloed versions of classification and enumeration systems for infrastructure as they are seeing it. However, we do not have a generally agreed upon system as we do have for malware (MAEC), exposures and vulnerabilities (CVE), and attack patterns (ATT&CK). By creating an Infrastructure SDO in STIX 2.1 we might be able to kick-start such a development.

6. One of the key insights gleaned during the Austin, TX face-to-face meeting was the need for more effective outreach and marketing to the broader CTI community, beyond those actively participating in OASIS. The addition of Infrastructure SDO will send a positive market signal to this broader community which may speed adoption. This is because the inclusion of such an object, in conjunction with the fully vetted Malware object, will convey a level of maturity of the STIX2 data model that heretofore has been lacking. The perception of a data model that is actually reflective of reality will greatly enhance the reputation build of this phase of the market innovation, adoption, diffusion and transformation cycle.

7. In research presented at the ENISA CTI Bonding event in Rome, Italy (ENISA, 2017) an analyst from CyberDefCon reported that the worst performing ASNs from its Shadowserver Foundation (2017) database over a multi-year period were AS29182 ISPSYSTEM (located in RU) and AS5577 ROOT (located in LU). This exemplifies how longitudinal data aggregated from proprietary and open sources can demonstrate that the Infrastructure of a large-scale operation can be used to identify bad actors at the Regional Registry level. Since one of the stated objectives of CTI is to facilitate public/private sharing this example shows how the research community can provide evidence that can be used by the jurisdictional law enforcement authorities for enforcement action. With an explicit “Infrastructure SDO” the evidentiary quality of the data for law enforcement can be improved.

8. During a Sports-ISAO sponsored Internship program run during the World Championship games in London in August 2017 a group of 60+ Interns from over 30 Universities across the U.S. working to support the program identified the “digital exhaust” of multiple attack patterns targeting sports organizations and the related sponsors of such. As a trainer for these novice threat hunters it was useful to provide visualizations of attack infrastructures to help them wrap their minds around the ideas of threat actors, campaigns, intrusion sets, indicators, cyber observables and other concepts we tried to capture in STIX2. I am able to generate such visualizations from several sources other than STIX. However, if I had had tangible evidence stemming from an Infrastructure SDO in STIX 2.X, the learning curve pedagogy would have been more streamlined. In summary, I needed the Infrastructure SDO in order to tie all of the pieces of the puzzle together.

If any of these arguments, make sense to you please let your voice be heard so that we can expedite the build towards consensus before an official Ballot on STIX 2.1. Also note that I recognize that the STIX Subcommittee is seeking a more orderly scheduling of discussions around Version 2.1 SDOs. Therefore, I’m requesting that we reopen discussions on this object when it would fit into the existing schedule and SDO priorities.

_______________________________________________________________________

References:

Caltagirone, S., Pendergast, A., Betz, C. (2013, July 5). The Diamond Model of Intrusion Analysis. http://www.dtic.mil/get-tr-doc/pdf?AD=ADA586960

ENISA (2017). https://www.enisa.europa.eu/events/cti-eu-event/enisa-cti-eu-event

Hutchins, E., Cloppert, M., Amin, R. (2011). Intelligence-Driven Computer Network Defense Informed by Analysis of Adversary Campaigns and Intrusion Kill Chains. Lockheed Martin.

Shadowserver (2017). https://www.shadowserver.org/wiki/pmwiki.php/Main/HomePage

Tufte, E.R. (2001). The Visual Display of Quantitative Information (2^nd Ed.). Graphics Press: Cheshire, CT.

--

Jane Ginn, MSIA, MRP

CTI TC Secretary, OASIS

Co-Founder of Cyber Threat Intelligence Network, Inc.

jg@ctin.us

cti-stix message