cti message

Subject: RE: [cti] Thoughts on STIX and some of the other threads on this list

From: "Grobauer, Bernd" <Bernd.Grobauer@siemens.com>
To: "cti@lists.oasis-open.org" <cti@lists.oasis-open.org>, "cti-users@lists.oasis-open.org" <cti-users@lists.oasis-open.org>
Date: Wed, 9 Sep 2015 11:49:26 +0000

Hi,

this is a bit late, but there were several requests for
broad feedback on the major issue of what the future
should look like, so here goes ...

First, for the impatient, reader, the contents of what
comes below in a nutshell:

I) If we change the binding from XML to something else
   for the next major release, we need to be sure that
   the effort of doing so does neither slow us
   down to much nor distract too much attention from
   what really are the major problems of dealing
   with STIX/CybOX (the fact that we use XML
   is *not* the issue that will decide between success
   or failure). And currently, I must says,
   I am not sure that we can take the time to
   solve the major issues *and* change the
   binding.

II) I agree with the list of problem points that
    Aharon has sent around some days ago: these are
    things we need to solve *first*.

III) To Aharon's list, I would add the problem of
    embedded relationship info rather than describing
    relations between entities/objects separately --
    that is a major design flaw of STIX/CybOX.

IV) 80% (or say 60% or whatever -- at least a substantial percentage)
    of STIX/CybOX has never been used so far. We should consider
    simplifying drastically.

V) Another source of complexity: CybOX tries to be all-encompassing,
   including the expression of what are essentially certain types of
   signatures (that is where the logical operations come in) as well
   as the description of all kinds of observable stuff, going even
   into rather detailed forensic information. In contrast to this, the
   #1 use case most people are after is the sharing of rather simple
   basic indicators ("observable patterns", "things to look
   for").

   So here is a bit of heresy: Maybe we should consider a
   STIX-without-CybOX variant in which basic indicators can be
   expressed in a very simple key/value-list-kind-of-way (where
   mappings to default-representations into CybOX make sure that we
   have well-defined semantics and the link to CybOX is preserved)
   without logical operators.

Now in detail:

I) Regarding the big issue of XML vs. JSON vs. "something else":

I do not think that XML vs JSON will be decisive regarding the
question whether STIX/CybOX will be relevant in future: there
are advantages and disadvantages to both.

What, however, will be decisive, is whether STIX/CybOX really
are helpful in expressing and consuming useful cyber-threat intelligence.

As others have already said on the mailing list: the main issues that
currently make STIX and CybOX hard to deal with have *nothing* to
do with the fact that we currently express things in XML rather than
JSON.

I think the main questions in going forward is: what should we
spend our (limited) resources on when moving towards the next
major release?

What I am worried about is that switching the representation
from XML to something else will lead to significant delay
as well as draw focus from the topics that really matter to
the format issue rather than the really pressing issues.

Maybe I am overestimating the required effort of switching
the binding, but let us not be fooled by the idea that
having a piece of code that produces/consumes, say, JSON,
constitutes a language definition. The problem of
defining a JSON binding is definitely harder that "just take
XYZ's current JSON implementation (with XYZ being Mitre,
Intelworks, Bluecoat or whatever) and be done with it." --
especially since none of the existing JSON implementations
is even close to a complete coverage of either STIX
or CybOX.

Right now, the tone on the mailing list suggests that
it is a given that the upcoming major releases of STIX
and CybOX will be based on a non-XML binding. Is that
really so?


II) Now, regarding the problems we should really focus on.
Aharon has made the following points:

1) Complex logical operations

2) Heavily nested objects

3) Object Versioning

4) Relationships that go 50 levels deep backwards and forwards.

5) Making it easy just to share a single evil URL with someone. Reduce verbosity ?

6) XPATH in the Marking Structure. Or the marking object in general.

7) Multiple ways to say the same thing

8) Almost every field being optional

I agree to these points. Let me add/expand to that below.

III) Relationship information embedded in STIX entity / CybOX Objects

   This, I feel, is the number one design flaw of STIX/CybOX in more than one
   ways:

   - there is no way to communicate a relationship between two things
     without (re)defining at least one of the things (namely the 'thing'
     into which the relationship info has to be embedded)

   - not all relationships that one might want to express are supported

     - why do I have to go through a campaign entity in order to
       associate an indicator with a threat actor?

     - why do I have to go through an incident to associate an incident
       with a threat actor

   - embedding of relationships leads to more complicated entity definitions
     and expressions (cf. Aaron's 2nd item)


IV) High complexity, of which 80% (a guess) have never been used so far

   If you look at python-stix/cybox, certainly the most comprehensive of
   all implementations for producing and consuming STIX/CybOX: there is still
   stuff missing. Also, if you look at what CybOX objects, or rather which parts
   of which CybOX objects, are supported by the existing STIX/CybOX-based
   systems, it is hard not to reach the following conclusion:

   We take a sledgehammer to crack a nut (or, as we say in Germany: we are building
   canons to shoot at sparrows).

   So we have a standard, for which there is no system able to either produce
   or ingest (and make sense of) even close to 100% of the standard.
   That is a problem, because

   - the unused 80% (or 50% or whatever) add complexity at all stages of
     dealing with the standard (defining it, tooling for it, ...)

   - the perceived benefit that we are future-proof in the sense
     that pretty much everything can be expressed, is not really much
     of a benefit: what use is it to be able to express something,
     which nobody is able to process?

   We try solve part of the problem with profiles that describe of how certain
   use-cases are to be encoded ... but if we find that those profiles
   use a 20% subset of the standard, maybe that tells us something?

V) Once more complexity: CybOX: Simple Indicators vs. Signatures vs. Observables/Forensics Information

   The way, CybOX is currently used in CTI exchange is, again, taking
   a sledgehammer against a nut:

   - The indicators, most of us currently are able to communicate and
     process are rather simple: a hash value, an URI, a domain name,
     an email address.

     So what usually happens is that the simplest of indicators are wrapped into
     a CybOX object, only to be unwrapped by the receiver and stuck into
     on of his six buckets of information he is able to deal with.

     That is fine, I guess, though if the producer starts adding information
     into CybOX objects, which is something the receiver's "unwrapping" code
     will ignore ... and it may take the receiver some time to realize
     that his automated processes are discarding information. Or the
     importer/unwrapper may even break, interpret things wrongly, ...

     Now take a look at what, e.g., MISP does: an indicator is
     basically a key-value pair, where the key describes the kind of indicator
     and the value the indicator itself:

     - the problem of inadvertently missing information does not occur: either
       I know how to deal with a certain indicator type or I do not

     - adding new indicator types takes as little as adding a new key rather
       than defining a whole new object type.

     There are, of course, also drawbacks to the MISP way of doing things,
     but currently, MISP is a lot closer to what is current practice in
     sharing technical indicators than CybOX.

  - Aharon mentioned the complex logical operations that are troublesome.

    Their genesis, that is at least my understanding, lies in the
    fact that STIX/CybOX owe a lot to OpenIOC.

    However: OpenIOC at its heart is a language for expressing
    signatures/patterns for a certain line of products and geared towards
    the capabilities of these products. If CybOX/STIX had started out, e.g.,
    from a line of thinking closer to a different product line, CybOX/STIX
    might look quite different. Why do we have logical operators, but
    not, say temporal, operators ("first this, then two times that, and
    then finally again this, all within 5 seconds"), as we have
    in SIEMs or network monitoring?

    Do we need/want CybOX/STIX to be an all-encompassing generic
    signature/pattern language? Or is that maybe a case for
    the current test-mechanism feature that allows the embedding
    of SNORT, OpenIOC and what have you?

  - Recently, Sean reminded us on the mailing list, that CybOX
    also has its uses in MAEC for malware expressions and in
    the expression of forensics information. It is great, that
    CybOX is so powerful and versatile ... but most of its
    power seems to be lost or even contra productive when it
    comes to getting basic CTI exchange started.

  Some time ago, Terry alerted us to the fine but important
  distinction between observable pattern (what to look out for)
  and observable instance (what has really been seen). Although
  we have talked about use-cases of communicating observable
  instances (I have seen this and that): the majority, I think,
  is interested in exchanging stuff to look out for.

  I may be committing heresy now, but let us think the unthinkable for
  a moment: How about a profile of STIX that allows communication of
  basic indicators (observable patterns) in a way that is closer to
  MISP's key-value pairs (with a well-defined mapping into CybOX
  proper), leaving full CybOX to cases in which observable instances
  (i.e., something that has been observed) are to be communicated?
  A mapping from such a simplified expression into a standard
  CybOX representation would then provide precise semantics and
  retain the link to CybOX-proper.


Kind regards,

Bernd

----

Bernd Grobauer, Siemens CERT

Follow-Ups:
- RE: [cti] Thoughts on STIX and some of the other threads on this list
  - From: "Bush, Jonathan" <jbush@dtcc.com>

References:
- RE: [cti] Thoughts on STIX and some of the other threads on this list
  - From: Cory Casanave <cory-c@modeldriven.com>
- Re: [cti] Thoughts on STIX and some of the other threads on this list
  - From: "Jordan, Bret" <bret.jordan@bluecoat.com>
- Re: [cti] Thoughts on STIX and some of the other threads on this list
  - From: "Jason Keirstead" <Jason.Keirstead@ca.ibm.com>
- RE: [cti] Thoughts on STIX and some of the other threads on this list
  - From: Cory Casanave <cory-c@modeldriven.com>
- RE: [cti] Thoughts on STIX and some of the other threads on this list
  - From: "Jason Keirstead" <Jason.Keirstead@ca.ibm.com>
- RE: [cti] Thoughts on STIX and some of the other threads on this list
  - From: "Bush, Jonathan" <jbush@dtcc.com>
- RE: [cti] Thoughts on STIX and some of the other threads on this list
  - From: Patrick Maroney <Pmaroney@Specere.org>
- RE: [cti] Thoughts on STIX and some of the other threads on this list
  - From: Cory Casanave <cory-c@modeldriven.com>
- Re: [cti] Thoughts on STIX and some of the other threads on this list
  - From: "Jordan, Bret" <bret.jordan@bluecoat.com>