OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

cti-stix message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [cti-stix] STIX: Messaging Standard vs. Document Standard


>I would offer that we are unequivocally, unquestionably, incontrovertibly working on a message format. 

Eric, I would have to respectfully disagree.
Though not just with this statement here but really with the false dichotomy that I think this thread posits.
While I certainly recognize the tradeoffs and contention between messaging-centric and document-centric perspectives, I disagree that STIX is inherently one or the other.
STIX is not just a messaging standard and STIX is not just a document standard. It is not an either-or between these two choices.

STIX is the information model (language) for cyber threat information.

STIX at its core is not targeted at telling you how you must lexically/syntactically structure your messages.
Similarly, at its core it is not targeted at telling you the bits and bytes you must use to store your content.

Whether you are exchanging CTI across messages or storing it within documents the information involved and its meaning is the same. 
Structures and formats for exchange or storage may differ or vary as long as each maps back to the same information model so that everyone can understand what is there.
This is the reason that the specification for STIX is a data model (currently UML) that is separate from specifications binding that data model to any particular serialization format.
If the binding specifications are done in such a way that offers high assurance in the integrity of the mapping to the underlying information model then the real nut of the discussion here becomes more tractable. Messaging-centric vs document-centric is a serialization issue not a forced choice on the underlying information model.
Different serialization options (JSON, XML, protocol buffers, etc.) and how they are particularly applied are not just relevant for messaging (which are the battles going on for a long while and have resulted in agreement to choose an MTI) but also between things like messaging and document/storage. 
In other words, with well mapped bindings you can define a binding for messaging (could be an MTI) that is tuned to the particular needs of messaging and you can define a different binding for document/storage that is tuned for its particular needs or you might not define a standardized one at all for document/storage leaving that up to each implementer (I think this last part may be part of Eric’s statements below). 
This can be done without biasing things either direction and forcing either side to make unnecessary compromises. I could pull content from my repository in one form (document-centric), send it to you in another form (messaging-centric) and you could receive it and store it in your repository in a document-centric form. This is all possible when the information itself is mapped to the same underlying information model.

I think we need to be very careful to avoid coming down on one side or the other in a false battle between messaging-centric and document-centric camps and letting that drive towards attempts to bias the actual underlying information model. The underlying information model should be driven by the information people/systems need to express about cyber threat information whether that information is being messaged, stored or otherwise. For folks concerned with messaging-centric needs lets focus that tuning primarily at the binding level and not at the information model level where it can negatively impact the broader set of use cases.

This layered approach should be self-evident in the active work products we have in play currently with a spec for the language itself and separate binding specs. The XML binding spec for STIX x1.2.1 (representing our pre-2.0 status quo) is still in progress but should be finished soon. For v2.0 there will be a similar binding spec but would be for the JSON MTI serialization.


sean



From: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, Eric Burger <ewb25@georgetown.edu> on behalf of Eric Burger <Eric.Burger@georgetown.edu>
Date: Sunday, December 6, 2015 at 8:47 AM
To: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>
Subject: Re: [cti-stix] STIX: Messaging Standard vs. Document Standard

Going back to Jason's question that started this thread: are we building a document format or are we building a protocol suite, of which STIX is the message format?

I would offer that we are unequivocally, unquestionably, incontrovertibly working on a message format. To be possibly just slightly controversial, I would offer that if we think STIX is the document format, then we will be moving cyber threat analysis forward about a year or two for a year or three, and then will irrevocably keep cyber threat analysis frozen in mid-2010’s for the next ten years.

Saying STIX is the document format says implies everyone has the same needs for processing the data or the document format has to cover everybody’s needs. We seem to be ‘working’ towards that goal, which may be part of why it takes weeks to define what a date is. Worse, since it is so hard to make everyone happy, once we make a decision, it is cast in stone. The evil in this result is not that STIX becomes inflexible and brittle. The evil is that if people think they have to store STIX as STIX, it means if someone comes up with a better way to look at or analyze threat data, they are S.O.L.

Saying STIX is the message format means we can relax - so long as I can express the transfer of information in STIX, you can store it in whatever way you want. Likewise, so long as I can express information in STIX, I can generate it from whatever format I happen to have it in.

I am not knocking Soltra - it is really cool they can process and store native STIX as STIX. However, I would think that as folks involved just a little in the cyber security environment, that we might shy away from saying that every CTI platform needs to look like Soltra. Said differently, there is nothing inherently wrong with someone saying, “STIX looks cool - I’ll base my world on STIX.” However, there is something majorly wrong with someone saying, “STIX looks cool - everyone must base their world on STIX.”

I would offer one modification to Jason’s observations for a messaging standard: Maximum byte efficiency is explicitly not a goal. If it was a goal, all protocols would use something like ASN.1 PER or handcrafted binary. No one would use keyword-value (e.g., SMTP and S/MIME and HTTP), XML (e.g., IODEF, SIMPLE), or even JSON. All of those carry too much overhead. So, why do we use them? Because I can use off-the-shelf parsers and, in the case of XML, have access to tons of tooling.

One last observation: if the goal is for STIX to be a document format, then there is one and only one reasonable encoding: XML. With XML, you can use XQUERY to ask about the data and variations on XPATH to send updates on the data. Done in one.


On Nov 30, 2015, at 7:50 PM, Cory Casanave <cory-c@MODELDRIVEN.COM> wrote:

Bret,

I will answer your questions, below [cbc] , but perhaps we will then agree to disagree and let the process work. I doubt more needs to be said.

-Cory

 

 

 

From: Jordan, Bret [mailto:bret.jordan@bluecoat.com]
Sent: Monday, November 30, 2015 7:13 PM
To: Cory Casanave
Cc: Jason Keirstead; Richard Struse; cti-stix@lists.oasis-open.org; Wunder, John A.
Subject: Re: [cti-stix] STIX: Messaging Standard vs. Document Standard

 

1: Definition method

Bret: The specification is English prose.

Cory: The specification is a machine readable model that includes English prose.

 

How is this an issue?  The generalities that there will be problems is vague and I am not sure how it applies to this specification.  

[cbc] Well, it has been a huge issue in my own attempt to understand STIX and map it. Some things still make no sense. It is well documented that there are multiple ways to say the same thing – will all the implementations work together? Perhaps I can find some time to document some of the “WTF” questions that came up as I looked as STIX-1. When prose specifications are interpreted differently you get very expensive and hard to resolve BUGS. You get systems under the same standards that don’t work together. To me, this is a problem.

 

When STIX moves to Cap-n-Proto (aka Binary) there will be no more english fields names,

[cbc] English field names are not required. What is required is a programmatic way to go from instance to specification. This can be done in binary.

so how is this an issue? HTTP and HTML being english centric seem to have worked well.  A specification is a specification.  Building unit-test to test compliance is a relatively easy thing to do.  

[cbc] Wow, you must be really good. Compliance has been hard for most specifications.

This will guarantee interoperability.  

[cbc] Trouble is, it does not. Interoperability is hard.

And if one vendors (internet explorer 6) comes out that breaks the eco-system, then consumers should not buy that product and force that vendor to change.

 

2: Schema production

Bret: The field names and structure are hand crafted.

Cory: The field names and structure are produced from the model.

 

Organizations and development shops will always produce their own APIs to generate STIX content.  

[cbc] That would be unfortunate for wide-scale adoption.

 

Some may use community build modules / APIs, depending on the licensing and intellectual property aspects.  It is very easy to build compliance and unit-testing to verify that what someone produces will match the specification.  

[cbc] So do we have that for STIX 1? Ask Oasis about ease of conformance suites.

 

STIX is not that big.  

[cbc] STIX and all it imports is thousands of terms. What is big to you? Or, are you assuming a much reduced scope? If so, the scope question should be #1!

 

I built an API to do all of the indicators and TTP stuff in a few days.  I would argue that the best thing we could do would be to present a text document form the UML that listed out each field name by idiom.  Then developers can just copy and paste the entire list.  This way there will be no-type'os.  But once again, a simple unit-test will pick up any issues.

[cbc] I think we have it on a key point – “Idioms”. Idioms are examples, not specification’s. Coding to an idiom would be very fragile and would then not interoperate with others who coded to other idioms that utilize the same or overlapping data.

By the way, since you will copy/paste the field names I’m not sure why the introduction of a namespace prefix is such an issue, it would have zero development in inconsequential runtime overhead.

 

 

3: Namespaces

Bret: The tag names in the data are implicitly mapped to the schema by name

Cory: The tag names are explicitly mapped to their schema and definition by name and explicit namespace

 

I disagree.  In the UML it is very easy to see in the 20 items for each idiom if we have re-used the same name more than once.  

[cbc] Again, Idioms are irrelevant. We need to look at all the terms that could be used in any STIX message. I agree it is easier to see in UML. So it is easier to get agreement on the content.

 

Once again, we are trying to solve a problem that is not there.  Using the same name for a field in a different idiom, is not an issue.  Higher level code will easily handle this and vendors and developers map those data fields in to their own dataset and that do something with them. Namespaces allows for people to artificially extent a schema and do things that will BREAK compatibility. 

[cbc] Interesting assertion. I don’t see how namespaces allow people to break interoperability. Namespaces provide for interoperability. My guess is you are postulating externally introduced namespaces? It is up to the policy of the specification as to the extensibility of new namespaces. I would suggest that some (controlled) extensibility is required for agility. But that is a choice independent of namespaces. CTI could forbid any new namespaces.

 

4: Variability

Bret: I am only concerned with a specific and very structured exchange schema.

Cory: There will be multiple patterns of exchange for different use cases based on the same underlying model.

 

Once again I disagree.  It is just as easy for me fill out every field and send the blob of data as it is to only fill out one-three fields and send it.  I am not only concerned with sending minimal data.  If I send several blobs of data, some TTPs, some ThreatActors, some Indicators.  Receiving code can easily handle this by saying:

if type = "indicator" do foo

elsif type = "ttp" do foo1

elsif type = "threatactor" do foo2

etc

One group may only be able to send indicators with certain data, and other vendors may be able to send something else.  Great, my code will consume and do things with all of it.

[cbc] And ignore what it doesn’t need, right? So what you are saying is that there is one large schema, no idioms and everything is optional? You may want to layer some required interaction profiles on top of that. In any case, CTI and the expectations of using it will change over time – better to plan for it.

 

5: Development

Bret: All I need is a text editor and I will type in my implementation.

Cory: Reading, writing, mapping and even presenting the data will be heavily assisted with automation. Only special algorithms will be coded.

 

This is a problem that vendors will solve.  This is not a standards track issue.  Vendors will produce neat and interesting tools that make use of the data.  The vendors that do the best job, will make the most money and get the most sales. 

[cbc] What you are suggesting is disenfranchising a large set of vendors that do not implement the way you do, it is up to the standard to provide the artifacts to enable a large community, not pre-suppose particular implementation styles, idioms and use cases.

 

To answer your question.  I am not against a solid UML specification or model or what every you call it.  In my mind a UML model is such a wonderful thing to have.  It makes it so much easier to learn and understand STIX.  When I first started playing with STIX, I build my own UML model as there wasn't one.  I needed to do that to make heads and tails of what was going on.  So yes, we need a UML specification / model.  

 

Where I believe we fundamentally disagree is on the idea of code writing itself and auto-generating itself.  

[cbc] So you work in machine code, no compilers? No virtual machines? No code gen from schema? No visualization tools? No analytics engines? No mapping tools. How are things in the 60’s?

So how ‘bout his? We agree on a small subset model and a JSON representation of it. We then see if that can be generated, if so, there should be no issue.

 

Some people may use this, but this is NOT a requirement for the standard, IMHO.  

[cbc] Again, it is for a standard that enables a larger community.

 

A nice and clean UML specification

[cbc] Ok, lets start on that now and stop spending so much time on one of multiple syntaxes.

 

and a super easy to implement binding in JSON is all we need at this point..  

[cbc] I really want you to have that as well!

Long-term I see the need for moving to a binary representation in say Cap-n-Proto, but that will be 3-5 years from now if we are successful.  

 

 

Thanks,

 

Bret

 

 



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]