Re: [cti] Thoughts on STIX and some of the other threads on this list [f

cti message

Subject: Re: [cti] Thoughts on STIX and some of the other threads on this list [formats discussion]

From: "Wunder, John A." <jwunder@mitre.org>

To: "cti@lists.oasis-open.org" <cti@lists.oasis-open.org>

Date: Wed, 16 Sep 2015 13:01:47 +0000

Hi Jason/all,

Good points from both you and Bernd. I have a few thoughts:

- To your last question, I was personally wondering about whether there were any cases where there was a widely implemented standard with different bindings. I asked Sean (Barnum) about it, and he helpfully pointed me to two OASIS specifications: XACML (which has a JSON profile) and OData. OData in particular does seem to have several bindings on equal footing (notably ATOM and JSON). If anyone else knows of any I think it would be helpful to point us to them so we can see how they approached the binding phase (and the modeling phase, to be honest).

- I share your concern about automatically generating the bindings, but I think the approach would not be to fully automate it but to define a set of rules that you follow to manually or semi-automatically convert it. This is the approach you’ll see in STIX 1.2.1. So, for example, if you have a “list” in the high-level model, in XML you might translate it as <ListItems><ListItem/><ListItem/><ListItem/></ListItems> while in JSON you would use [{item},{item},{item}]. This should allow us to define bindings that correctly follow the design patterns of the binding format (nobody wants XML patterns in JSON or JSON patterns in XML) while still deriving off of the same high-level model and therefore staying semantically compatible. To Bernd’s point #1, while this manual process would be time-consuming, ideally we only need to do it every so often. We could probably also write some ad-hoc scripts to at least semi-automate it, even if we don’t use off-the-shelf support for generating bindings.

- This is my personal opinion, but to me the strongest case to defining an alternate binding is a binary encoding. XML and JSON solve similar use cases but a binary encoding like what Trey has proposed might be a good alternate for high-scale, high-volume use cases where even serializing to JSON is too large or slow. That said, allowing either XML or JSON as an alternative binding (presuming one will be the MTI binding, which may not be the case) shouldn’t have huge ecosystem compatibility implications assuming that the MTI specification is implemented as well (at least on external interfaces).

- I share the concern about creating incompatibilities in the ecosystem, as I’ve expressed before. The intent of the MTI specification is to serve as the lingua franca…while tools could of course choose to only implement an alternative protocol, by having the MTI binding literally be _mandatory_ to claim full compliance with STIX (or CybOX or TAXII) I think we can mitigate those concerns. Users of alternative bindings would either have to support the MTI specification or be very clear that they’re not compatible with it. I believe (and people from OASIS can correct me if I’m wrong) that we can define these rules when we write the conformance sections of the documents.

- To Bernd’s point #3, it’s my opinion that as we simplify and improve the model itself as well as how it’s expressed in the binding format (even if we stick to XML as the MTI spec, we can use simpler XML), the need for things like python-stix should decrease. While we definitely still want to have them, one of the nice thing about a lot of popular formats is that they’re easy enough to use “raw”. When I worked with the Twitter or Facebook APIs (in my previous life at a startup) I didn’t need code-level bindings (what we call APIs) because the JSON and XML formats were easy to work with natively. I think if we get to that point for STIX/CybOX (while not losing support for important use cases of course) we’ll have done a great job. As a side effect, that would also make generation of code-level bindings even easier.

John

From: <cti@lists.oasis-open.org> on behalf of Jason Keirstead
Date: Wednesday, September 16, 2015 at 8:06 AM
To: Bernd Grobauer
Cc: "Wunder, John A.", "cti@lists.oasis-open.org"
Subject: RE: [cti] Thoughts on STIX and some of the other threads on this list [formats discussion]

RE Schema, JSON Schema exists and is codified in an IETF draft RFC - http://json-schema.org/.

RE Automated Tooling - as I stated previously in this discussion - I myself do not have much faith in the output of any automated tooling from a high level model into a language binding that would be of sane enough quality that it could be codified as a standard.

Such generated bindings are always fraught with problems, because it is extremely difficult, in fact close to impossible, to define a high level model that can seamlessly translate between different data interchange formats, unless you either (a) don't care at all what that binding looks like, or (b) purposely curate your high level model so that the binding output looks proper in your MTI implementation... in which case, you are not really making a true high level model at all and are instead just using it as an instruction set for a code generator.

This is the root problem with this approach of using a high level model to make an MTI binding, instead of simply codifying "one true format". Its making a lot of extra work for what I see as little real gain in practice (are any vendors going to ever actually implement any of these other bindings? Very unlikely in my opinion.)

I've never seen a widely implemented RFC that had multiple different types of possible bindings in it (XML, JSON, other) for a data interchange format standard. Is there an example of this? I'd like to see how it was approached.

-
Jason Keirstead
Product Architect, Security Intelligence, IBM Security Systems
www.ibm.com/security | www.securityintelligence.com

Without data, all you are is just another person with an opinion - Unknown

"Grobauer, Bernd" ---2015/09/16 06:42:05 AM---Hi, here is a little picture that captures some of the concepts contained in John's email

From: "Grobauer, Bernd" <Bernd.Grobauer@siemens.com>
To: "jwunder@mitre.org" <jwunder@mitre.org>, "cti@lists.oasis-open.org" <cti@lists.oasis-open.org>
Date: 2015/09/16 06:42 AM
Subject: RE: [cti] Thoughts on STIX and some of the other threads on this list [formats discussion]
Sent by: <cti@lists.oasis-open.org>

Hi,

here is a little picture that captures some of the concepts contained in John's email
(I have added something I call 'translations' to the picture, because I think we may need them):

The questions below have been touched upon here and there in various emails on the various lists:
I feel it is important to restate them in a structured way, so that we consider them all when choosing
HLM and MTI binding:

1) What is our requirement on how the 'derive from' relation comes about?
Do we require an automated or semi-automated
process? We are talking about very sizable models, so a completely manual process will pose some challenges.

My current feeling is that if no HLM-MTI-pair with an at least semi-automated process can be found, then the
nature of the HLM is documentation rather than 'ground truth' and we should think twice about spending
effort for a HLM which we then transform by hand into a binding. Without an HLM, the MTI-binding
would have to serve as HLM, as is the case with XSD in the pre-OASIS MITRE standards.

2) What is our requirement on how the MTI binding is actually defined?
I would argue that some kind of schema definition is required and "definition by code" is not sufficient.

So, with XML we have XSD. It seems there are schema-definition schemes for JSON, but I do not
know how powerful/stable/universally-accepted these are, nor what the status regarding item 3) below is.

3) How much automated support for deriving an API for the MTI binding do we require?
Obviously, we need APIs to produce/consume content in the binding(s). Especially for the MTI,
we should think about how much automated support we require for producing an API for a given binding.

Currently, MITRE's STIX/CybOX APIs are based on automated mechanisms that turn XSDs into
Python code for parsing and generating XML conforming to the XSD. Given the size of the models
we are contemplating, writing an API from scratch might be quite a bit of effort.

4) What are our requirements regarding translations between the MTI binding and alternative bindings?

Without some kind of executable translation between two bindings, there is a danger that
the standard becomes fragmented, because a system supporting one binding may not be able
to interpret data that is available in some other binding.

Of course, this problem could be solved "by policy", stating that each OASIS-CTI-compatible system
must support the MTI binding (thus forcing the implementors who work with an alternative binding
to also implement a translation fromt/to the MTI), but the danger remains that we suddenly
have systems that exclusively speak an alternative binding. So maybe there should be a requirement
that an "official" alternative binding must be supported by a reference implementation that translates from/to
that binding into the MTI binding?