[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: RE: [cti] Thoughts on STIX and some of the other threads on this list
Hi, this is a bit late, but there were several requests for broad feedback on the major issue of what the future should look like, so here goes ... First, for the impatient, reader, the contents of what comes below in a nutshell: I) If we change the binding from XML to something else for the next major release, we need to be sure that the effort of doing so does neither slow us down to much nor distract too much attention from what really are the major problems of dealing with STIX/CybOX (the fact that we use XML is *not* the issue that will decide between success or failure). And currently, I must says, I am not sure that we can take the time to solve the major issues *and* change the binding. II) I agree with the list of problem points that Aharon has sent around some days ago: these are things we need to solve *first*. III) To Aharon's list, I would add the problem of embedded relationship info rather than describing relations between entities/objects separately -- that is a major design flaw of STIX/CybOX. IV) 80% (or say 60% or whatever -- at least a substantial percentage) of STIX/CybOX has never been used so far. We should consider simplifying drastically. V) Another source of complexity: CybOX tries to be all-encompassing, including the expression of what are essentially certain types of signatures (that is where the logical operations come in) as well as the description of all kinds of observable stuff, going even into rather detailed forensic information. In contrast to this, the #1 use case most people are after is the sharing of rather simple basic indicators ("observable patterns", "things to look for"). So here is a bit of heresy: Maybe we should consider a STIX-without-CybOX variant in which basic indicators can be expressed in a very simple key/value-list-kind-of-way (where mappings to default-representations into CybOX make sure that we have well-defined semantics and the link to CybOX is preserved) without logical operators. Now in detail: I) Regarding the big issue of XML vs. JSON vs. "something else": I do not think that XML vs JSON will be decisive regarding the question whether STIX/CybOX will be relevant in future: there are advantages and disadvantages to both. What, however, will be decisive, is whether STIX/CybOX really are helpful in expressing and consuming useful cyber-threat intelligence. As others have already said on the mailing list: the main issues that currently make STIX and CybOX hard to deal with have *nothing* to do with the fact that we currently express things in XML rather than JSON. I think the main questions in going forward is: what should we spend our (limited) resources on when moving towards the next major release? What I am worried about is that switching the representation from XML to something else will lead to significant delay as well as draw focus from the topics that really matter to the format issue rather than the really pressing issues. Maybe I am overestimating the required effort of switching the binding, but let us not be fooled by the idea that having a piece of code that produces/consumes, say, JSON, constitutes a language definition. The problem of defining a JSON binding is definitely harder that "just take XYZ's current JSON implementation (with XYZ being Mitre, Intelworks, Bluecoat or whatever) and be done with it." -- especially since none of the existing JSON implementations is even close to a complete coverage of either STIX or CybOX. Right now, the tone on the mailing list suggests that it is a given that the upcoming major releases of STIX and CybOX will be based on a non-XML binding. Is that really so? II) Now, regarding the problems we should really focus on. Aharon has made the following points: 1) Complex logical operations 2) Heavily nested objects 3) Object Versioning 4) Relationships that go 50 levels deep backwards and forwards. 5) Making it easy just to share a single evil URL with someone. Reduce verbosity ? 6) XPATH in the Marking Structure. Or the marking object in general. 7) Multiple ways to say the same thing 8) Almost every field being optional I agree to these points. Let me add/expand to that below. III) Relationship information embedded in STIX entity / CybOX Objects This, I feel, is the number one design flaw of STIX/CybOX in more than one ways: - there is no way to communicate a relationship between two things without (re)defining at least one of the things (namely the 'thing' into which the relationship info has to be embedded) - not all relationships that one might want to express are supported - why do I have to go through a campaign entity in order to associate an indicator with a threat actor? - why do I have to go through an incident to associate an incident with a threat actor - embedding of relationships leads to more complicated entity definitions and expressions (cf. Aaron's 2nd item) IV) High complexity, of which 80% (a guess) have never been used so far If you look at python-stix/cybox, certainly the most comprehensive of all implementations for producing and consuming STIX/CybOX: there is still stuff missing. Also, if you look at what CybOX objects, or rather which parts of which CybOX objects, are supported by the existing STIX/CybOX-based systems, it is hard not to reach the following conclusion: We take a sledgehammer to crack a nut (or, as we say in Germany: we are building canons to shoot at sparrows). So we have a standard, for which there is no system able to either produce or ingest (and make sense of) even close to 100% of the standard. That is a problem, because - the unused 80% (or 50% or whatever) add complexity at all stages of dealing with the standard (defining it, tooling for it, ...) - the perceived benefit that we are future-proof in the sense that pretty much everything can be expressed, is not really much of a benefit: what use is it to be able to express something, which nobody is able to process? We try solve part of the problem with profiles that describe of how certain use-cases are to be encoded ... but if we find that those profiles use a 20% subset of the standard, maybe that tells us something? V) Once more complexity: CybOX: Simple Indicators vs. Signatures vs. Observables/Forensics Information The way, CybOX is currently used in CTI exchange is, again, taking a sledgehammer against a nut: - The indicators, most of us currently are able to communicate and process are rather simple: a hash value, an URI, a domain name, an email address. So what usually happens is that the simplest of indicators are wrapped into a CybOX object, only to be unwrapped by the receiver and stuck into on of his six buckets of information he is able to deal with. That is fine, I guess, though if the producer starts adding information into CybOX objects, which is something the receiver's "unwrapping" code will ignore ... and it may take the receiver some time to realize that his automated processes are discarding information. Or the importer/unwrapper may even break, interpret things wrongly, ... Now take a look at what, e.g., MISP does: an indicator is basically a key-value pair, where the key describes the kind of indicator and the value the indicator itself: - the problem of inadvertently missing information does not occur: either I know how to deal with a certain indicator type or I do not - adding new indicator types takes as little as adding a new key rather than defining a whole new object type. There are, of course, also drawbacks to the MISP way of doing things, but currently, MISP is a lot closer to what is current practice in sharing technical indicators than CybOX. - Aharon mentioned the complex logical operations that are troublesome. Their genesis, that is at least my understanding, lies in the fact that STIX/CybOX owe a lot to OpenIOC. However: OpenIOC at its heart is a language for expressing signatures/patterns for a certain line of products and geared towards the capabilities of these products. If CybOX/STIX had started out, e.g., from a line of thinking closer to a different product line, CybOX/STIX might look quite different. Why do we have logical operators, but not, say temporal, operators ("first this, then two times that, and then finally again this, all within 5 seconds"), as we have in SIEMs or network monitoring? Do we need/want CybOX/STIX to be an all-encompassing generic signature/pattern language? Or is that maybe a case for the current test-mechanism feature that allows the embedding of SNORT, OpenIOC and what have you? - Recently, Sean reminded us on the mailing list, that CybOX also has its uses in MAEC for malware expressions and in the expression of forensics information. It is great, that CybOX is so powerful and versatile ... but most of its power seems to be lost or even contra productive when it comes to getting basic CTI exchange started. Some time ago, Terry alerted us to the fine but important distinction between observable pattern (what to look out for) and observable instance (what has really been seen). Although we have talked about use-cases of communicating observable instances (I have seen this and that): the majority, I think, is interested in exchanging stuff to look out for. I may be committing heresy now, but let us think the unthinkable for a moment: How about a profile of STIX that allows communication of basic indicators (observable patterns) in a way that is closer to MISP's key-value pairs (with a well-defined mapping into CybOX proper), leaving full CybOX to cases in which observable instances (i.e., something that has been observed) are to be communicated? A mapping from such a simplified expression into a standard CybOX representation would then provide precise semantics and retain the link to CybOX-proper. Kind regards, Bernd ---- Bernd Grobauer, Siemens CERT
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]