Re: [cti] Thoughts on STIX and some of the other threads on this list

I would strongly agree with Sean that data format wars are a “False Dilemma” and with Patrick that a viable and attractive path forward is in hand.

As some of you may know, I have been utilizing model driven methods for years as well as helping to develop some of the standards. I would suggest that this is the viable path forward for the seeming dilemma as well as a way to better resolve some other issues.

The essential pattern is to use a technology & format independent representation of the conceptual and/or logical model of the CTI domain and, based on well-defined and fully automated production rules, produce the schema, APIs and other artifacts supporting the CTI domain in multiple technologies. These models then become the “single source of the truth” with respect to the semantics and structure of CTI information sharing. The production rules become the single source of the truth for how this information is represented in XML, JSON, Pythion, Java or any other technology of interest. It provides for a more rigorous and understandable domain model, consistency in how it is represented and a clear path for evolution.

Since the same model is the “source” for the various technologies, transformations between them can then also be automated. This can include exchange schema as well as APIs that simplify access to the technology specific exchange formats. Maintenance is easier and the content is more understandable to stakeholders.

The “costs” are that there is more up-front work on getting the models right as well as the production rules. Of course, work has already started on UML models for CTI. One mistake people make is expecting that “default” mappings in some tools will produce exactly what they need in some technology. CTI is complex and has too much legacy for default mappings. Tuning of the production rules will be required to get the quality of XML/Json (or whatever) out of the model. These mappings can include markings in the model (utilizing a profile) to indicate specific design decisions that may impact the best way to produce the technology artifacts. Once these markings and production rules are in place you reap the benefits for years. If you look at all the artifacts that will be required for CTI-2, a model driven approach will be cheaper even in the first iteration and much easier over time.

There is also the question of representation of this higher level of abstraction. The primary candidates these days are UML and OWL. While OWL has some interesting capabilities, we have seen a longer history of greater success with UML as it is more tuned to this paradigm. It is also well understood across most organizations with a visual notation. For this reason we would encourage considering a small UML profile for CTI (to capture those design choices) combined with the CTI domain model and mappings to XML & JSON as exchange formats and Java and Python for API access.

Regards,
Cory Casanave

Model Driven Solutions

OMG Representative to CTI

From: cti@lists.oasis-open.org [mailto:cti@lists.oasis-open.org] On Behalf Of Barnum, Sean D.
Sent: Monday, August 31, 2015 6:17 PM
To: Jordan, Bret; Mark Clancy
Cc: cti@lists.oasis-open.org
Subject: Re: [cti] Thoughts on STIX and some of the other threads on this list

I would like to once again strongly and hopefully politely make a request.

Can we please stop promulgating and spending cycles on this False Dilemma?

How to represent STIX information in the implementation of an information exchange or a repository or an analytic system is NOT an either-or decision.

Since the beginning of the STIX community it has NOT been an either-or decision.

When STIX began, when things were very nascent and we knew we didn’t know what we didn’t know, we as a community decided to use XML Schema to capture and think through ideas because it was widely understood by everyone, gave us an explicit mechanism for defining and validating structure and syntax, and it had a broad ecosystem of tooling available for it. This was in no way a declaration that STIX was only XML or ever would be. To the contrary, it was discussed that we would use it to figure out what we needed to and to test out ideas with real content and once we felt we had reached an appropriate level of maturity and stability that we would abstract out our consensus on structure and semantics to a non-implementation-dependent form and define different implementations against it as appropriate. It was always said that we would likely maintain the XSD implementation as a reference implementation since the work was already done and that there would be a body of implementations already in place using it. However, we also very clearly agreed that other implementations would/could be created as appropriate for particular use cases and technical contexts that required different implementation formats. JSON was explicitly identified as a likely option as were things like protobuf, OWL/RDF, etc. This has always been the plan and is still the plan. It is fundamental to how the charter for the CTI TC was set up and is now only a week or two away from that long-ago envisioned milestone where other implementations like JSON could effectively be defined in such a way that we could have some confidence that they would lead to technology implementations that are actually conformant to the same language standard structure and semantics as any other implementation.

It would be inappropriate to assert that XML is the one and only way that STIX should be represented (such an assertion has NEVER been made by the STIX community or the DHS/MITRE teams supporting it).

It would be equally inappropriate to assert that JSON is the one and only way that STIX should be represented.

The reality is that different data representation formats exist for a reason. It is not simply that each group decided to "roll their own” rather than reuse what others had developed for their use cases. Such unnecessary redundancy may exist in some cases but for the most part different data representation formats exist because different technical contexts and use cases have different requirements for things like size, speed, rigor, expressivity, flexibility, technical dependencies, etc. And different data representation formats have different advantages and disadvantages related to these sorts of requirements.

There does not exist any single data representation format that is the “right” answer or even an adequate answer for all technical contexts and use cases. Different situations call for different data representation formats.

Trying to force such a single option, while making natural supporters of that option happy, would almost certainly drive a good portion of potential adopters away from the table and for many of those who chose to stay at the table and compromise would deliver a reduced capability to whatever format may be appropriate to them. That reduced capability then gets passed on to the users.

Recognizing that no single data representation format (syntax and lexicality) is “right” for every context does not mean that we cannot have a single agreed to set of structure and semantics for the domain of information being represented. That is exactly what STIX is intended to be, a single standardized language specifying agreed to structure and semantics for cyber threat information supporting a broad range of technical contexts and use cases. Since the beginning of STIX a foundational principle of STIX has been that STIX is NOT a system or a repository or a sharing program but rather IS a language specifying structure and semantics for cyber threat information that can support any number of systems or repositories or sharing programs implemented by any number of organizations using whatever technologies are appropriate for them. This objective does not require a limitation to one and only one data representation format, and in it fact precludes such a limitation.

So, you are correct Bret in saying that “format impacts adoption”. If people feel that they have no option to leverage the format that is most appropriate for their situation then they are less likely to adopt.

You make a first argument that this is the situation that exists today with people who want to use JSON and I certainly agree with you that there are many people who say that they only want to do STIX in JSON.

You then make a second argument that JSON is the only “right” solution and that nobody wants XML and we should remove support for it.

Unfortunately, the factual statements in the first two sentences of this paragraph that support argument #1 completely invalidate argument #2.

We cannot twist logic to support a preferred option.

I am not sure what I can say about your assertion that nobody uses or cares about XML other than that it is very inaccurate.

While there are many players that support and desire JSON, there are also those who support and desire (or may even be required to use due to regulatory or policy issues) XML. It is in no way a unanimous opinion either way.

And that is only looking at two options. For use cases requiring low latency and high speed, neither JSON nor textual XML are really adequate. For situations like those, options like capn-proto, protobuf, thrift, EXI, etc. are far more appropriate.

For anyone who might read my statements above and interpret me as an XML fan-boy, please know that nothing could be further from the truth. If I were to go write a system today using STIX information, I would choose the appropriate data representation format based on the needs and context of that system. It is very possible that I would choose JSON or some other format based on their advantages and disadvantages. We initially used XSD to model STIX early on as it provided the appropriate advantages to support the sort of exploration we needed to do as a community at that time. That does not mean that it is the best option for someone creating any particular product today. It may be but it may not be.

One option that has been discussed to address both the need to support a variety of potential data representation format options and the desire to coalesce toward a single option is to select and specify a Mandatory To Implement (MTI) data representation format within the Conformance section of the STIX language specifications. This would mean that any implementation claiming to be conformant with the STIX language must at least support the MTI data representation format but could also support other additional formats as appropriate to its context. This would enable a minimum bar of interoperability at the format level between implementations but would not prevent people from doing what they need to do other than the MTI. I respectfully suggest that discussion around forming a separate working group to focus on selecting which format should be used for STIX be recentered around investigating and selecting an MTI format rather than an “only” format.

Net-net:

Any argument posing that STIX must select a single data representation format for all implementations (whether arguing for JSON or for XML or whatever) is a FALSE argument.
The ecosystem which STIX is and has always been intended to support requires the flexibility to support multiple potential data representation formats
1. No single data representation format is the “right” choice for all contexts
The important thing for STIX is specifying/modeling standardized structure and semantics of cyber threat information
Any specific data representation format binding and reference implementation must conform to the standardized structure and semantics of the STIX language
We are well along the path to support #2, #3 & #4 above. We should reach that point in the next couple weeks with the release of the STIX 1.2.1 language specs and the STIX 1.2.1 XML Binding Spec (with accompanying reference implementation)
Any specific data representation format binding and reference implementation should be capable of supporting automated and lossless transformation between itself and other conformant data representation format binding and reference implementations, including round-tripping. If the bindings/implementations are truly conformant to the STIX language/model then this should be possible.
One option is to specify a Mandatory To Implement (MTI) data representation format for STIX-conformant implementations that they must support at a minimum but does not preclude others.

I truly hope we can stop spending cycles arguing this False Dilemma and as Mark and Pat suggest focus on the fundamental structure and semantic issues that will make STIX more effective at supporting cyber threat information use cases and the users that use them. ;-)

Sorry for the length of this message. I started out attempting brevity but found that some factual explanation was required rather than just making brief statements of hyperbole.

Thanks for spending the time to consider this. I wish I had an Easter egg to hide down here for your effort. :-)

Sean

From: <cti@lists.oasis-open.org> on behalf of "Jordan, Bret" <bret.jordan@bluecoat.com>
Date: Friday, August 28, 2015 at 6:22 PM
To: Mark Clancy <mclancy@soltra.com>
Cc: "cti@lists.oasis-open.org" <cti@lists.oasis-open.org>
Subject: Re: [cti] Thoughts on STIX and some of the other threads on this list

Format impacts adoption, plain and simple. Why do you think Facebook went off and did their solution in JSON? Why does Soltra do JSON on the back end? Why does Intelworks do JSON? Why are other threat intel solutions doing JSON? Why are other yet to be released solutions similar to Soltra Edge that have not yet been announced also doing JSON?

As I have said before, all of the code that has been written and that will be written by this group, in the end, will account for probably only 5% of the total code that needs to be written. If those web developers, app developers, and open source developers that are going to write the other 95% hate the format, and refuse to work with it, then they will not write code for it. The Python libraries only go so far. We need libraries in C, C++, Objective-C, SWIFT, PHP, Ruby, Andoriod-Java, C#, etc etc etc..

Everyone that does not think this is an issue, please write some C code using existing STIX in XML.. Then lets talk....

Let me copy in some of my thoughts from another thread and down grade my own TLP as well.

Most vendors I talk too, ones that we would want to be on board with STIX and TAXII, always complain about XML. I did not start this effort with a bias against XML, as I too was an academic. But everything I hear, and ever vendor I talk to says the same thing.... So we should just do it and be done with it.

The religious debate is one-sides for sure. Meaning, people will avoid using STIX because of XML. But I doubt anyone at the end of the day would care if we stopped using XML. There is no one out there that is pushing for XML and will refuse to use STIX if it is NOT in XML.

Lets solve this problem and be done with it.

Thanks,

Bret

Bret Jordan CISSP

Director of Security Architecture and Standards | Office of the CTO

Blue Coat Systems

PGP Fingerprint: 63B4 FC53 680A 6B7D 1447 F2C0 74F8 ACAE 7415 0050

"Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg."

On Aug 28, 2015, at 15:29, Mark Clancy <mclancy@soltra.com> wrote:

I posted this to another threat intel list and it probably makes sense to have y'all see my comments.I can't copy the whole thread from the other list due to their rules (everyone else's comments are TLP amber, but I can downgrade my own TLP. ). It is a group of people who live and breath CTI on the defending things from badness side. I bet there is a good amount of overlap

-SNIP-

1. Structure and Context are what we need. Format is just that format. XML vs. JSON etc don’t matter in the end. Heck CSV file had the same problem. If the data is flat than the human puncher has to build the context so miscreants get a free lunch again. If every spreadsheet, JSON, or XML source has different columns or definitions we have a bloody mess. (Oh wait we did have that mess already and the approach was to say lets create a standard to fight that out... ) I still have not seen notepad die as an essential tool to defend a network as cut & paste is still state of the art in transporting threat data to security tools in most shops…

2. STIX regardless if over XML/JSON should not be manufactured/consumed by a human but a machine.

3. If you are hand crafting STIX then stop and go back to spreadsheets for your cut, paste, share, & consume fix. If spreadsheets in to JSON is your thing then do that too, but don’t confuse those home brew formats as being “structured”

4. If you are writing code to do it then STIX vs. JSON probably doesn’t really matter as each has their plus minus and there are libraries to make STIX go between XML and JSON anyway. I view this fundamentally as a Coke vs. Pepsi kind of debate as to which cola you like best. Both have plenty of sugar and caffeine, but in the end they do the same thing…

5. STIX Complexity – yeah this is a mixed blessing. Lots of way to do related things. The real problem is there is no implementation guidance and most implementations are just dealing with IOCs (indicators/observables) and all the interesting and useful context doesn’t show up in STIX output today and then plenty of people trying do that wrong.

a. A federal law enforcement group for example confused “indicator” and instead published everything as “incidents” in their STIX package

b. An ISAC published a really decent description of a Threat Actor, but did it as an Indicator

c. Lots groups publish one Observable per Indicator instead of linking them

d. Almost none of the OSINT has anything other than Observables, Indicators, or TTPs today.

e. Simple conventions like what should I put in the “Short Description” vs. “Description” fields. Should these overlap or be unique?

6. One thing I am going to try to do with OASIS is on the “implementation and usage” side vs. schema or format issue. Plenty of passionate technical folks beating that drum, but I am looking at the practitioner usage and finding all we need today if we just agree on HOW we do it within the spec.

7. I am working on getting OSINT into properly composed STIX objects linking Observable, to Indicator, to Campaign, to TTP, to Threat Actor etc. IMHO this is a most excellent use of university programs under fair use provisions or open source licenses. I’ll put some Soltra money and my own personal funds towards that objective. So happy to help coordinate others interest on this too.

Mark Clancy

Chief Executive Officer

SOLTRA | An FS-ISAC and DTCC Company

+1.813.470.2400 office | +1.610.659.6671 US mobile | +44 7823 626 535 UK mobile

mclancy@soltra.com| soltra.com

One organization's incident becomes everyone's defense.

cti message