OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

cti message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [cti] Thoughts on STIX and some of the other threads on this list


I would be all for having a week long brain storming / high bandwidth work session where we could flush out ideas and possible options (not make decisions).  If Aharon wants to set something up, maybe in the DC area, for all that want and can attend, I will show up.  Once we can flush out ideas to some level, then we can bring them back to the email list to further flush out and collaborate on.  Email is just a very poor and inefficient way of doing design work.

Doing those 8 items is going to be hard.   But we really need to focus on them, think through them carefully, and then make sure the ideas work.  

If we can get a group of data model designers, practitioners, and implementers around the same table, we can build a data model that is solid, simple, easy, and implementable in code. 

Thanks,

Bret



Bret Jordan CISSP
Director of Security Architecture and Standards | Office of the CTO
Blue Coat Systems
PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050
"Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg." 

On Sep 1, 2015, at 08:41, Bush, Jonathan <jbush@dtcc.com> wrote:

Being new to the group – How has this sort of thing been tackled in the past?  The 1-8 list below is not going to be ‘easy’ to sort out (or we would have done it already), and I can’t imagine we will get a ton accomplished with long email chains back and forth.  Do we need to organize some sort of an in-person working session?  (I realize I may not even have enough context or experience to participate in such a session, just trying to keep us moving towards solutions)
 
From: cti@lists.oasis-open.org [mailto:cti@lists.oasis-open.org] On Behalf Of Jordan, Bret
Sent: Tuesday, September 01, 2015 10:38 AM
To: Bush, Jonathan
Cc: Wunder, John A.; cti@lists.oasis-open.org
Subject: Re: [cti] Thoughts on STIX and some of the other threads on this list
 
I second John Wunder's proposal to pick JSON and an optional Binary encoding for the data exchange format.
 
I want us to have the following core values:
 
1) Simplicity (that does not mean limited or non rich/verbose data model, but it means the following 4 things) 
2) Ease of use
3) One way of doing things
4) Simple to understand
5) Simple to implement
 
I want to get us working on Aharon's list, and he called it out well.  These parts of STIX make it either difficult or nearly impossible to implement, so lets FIX them.  
 
1) Complex logical operations
2) Heavily nested objects 
3) Object Versioning
4) Relationships that go 50 levels deep backwards and forwards.
5) Making it easy just to share a single evil URL with someone. Reduce verbosity ?
6) XPATH in the Marking Structure. Or the marking object in general.
7) Multiple ways to say the same thing
8) Almost every field being optional
 

We have a real opportunity to be successful and go from just early adopters to mainstream use, but it is going to take a lot of work and we need to make sure what we design in the Data Model can be easily implemented in the Data Exchange Format.Having the best data model in the world will not help us, if you can not implement it in code.  

Thanks,
 
Bret
 
 
 
Bret Jordan CISSP
Director of Security Architecture and Standards | Office of the CTO
Blue Coat Systems
PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050
"Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg." 
 
On Sep 1, 2015, at 08:12, Bush, Jonathan <jbush@dtcc.com> wrote:
 
If we as a standards group focus on the abstract data model, and get that right, we really shouldn’t care about the backend format.  Why?  Because modern languages and architectures (such as Rest) simply don’t care.  If vendor 1 creates STIX using JSON and vendor 2 creates it in XML, the two can talk no problem…. If the applications are architected correctly.  Yes, I know this means we have a TAXI issue, but I will ask – What is TAXI providing that can’t be done with other means, really?
 
If we as a standards group stay down this implementation-based way of thinking, we are almost sure to lose this game.  We have to get our data model right, and let technology implementation groups do what they do best.  Right now, our mission is very confused.
 
Most of all, we have to stop wasting time debating this.  Competing standards (standards, not implementations) are not going to let us keep the head-start that we have.  Let’s start solving our issues, the sorts of things that are making implementation painful regardless of the format (optionality, nesting, missing objects, etc…), and then let everyone else get to work implementing them!
 
From: cti@lists.oasis-open.org [mailto:cti@lists.oasis-open.org] On Behalf Of Wunder, John A.
Sent: Tuesday, September 01, 2015 8:58 AM
To: cti@lists.oasis-open.org
Subject: Re: [cti] Thoughts on STIX and some of the other threads on this list
 
Well said, Mark. When it comes to exchange formats, less is more and “one way of doing things” as a philosophy should start with the representation format. My nightmare scenario is that we allow both JSON and XML and two STIX products are incompatible for no reason other than Lauren does Java and likes XML and Bob does Python and likes JSON.
 
That said, I do think Sean’s option to have one “Mandatory to Implement” format is appealing: how about we pick one human-readable representation (JSON or XML) as the MTI representation and then allow other formats as optional if they meet different use cases? In particular, I can imagine having a binary representation might be nice for a sensor (massive volume) use case.
 
-- break --
 
I also agree with Aharon: format is, in the end, one of the smaller things we can do to make STIX easier. Pat, Sean, and Mark also made a good point that we need to spend way more time than we are now on actually improving the conceptual model to advance the state of the practice. So to that end, my suggestion is to reach some temporary closure on this topic so we can focus on making STIX better and easier to use.
 
My proposal: let’s go with Sean’s suggestion in #7 below and informally get consensus on a single format that we’ll notionally call the “mandatory to implement” format. My impression is that this would likely be JSON but if it’s XML or RDF so be it. Then, let’s informally get consensus on one binary format that can be notionally the optional one. I know nothing about binary formats so can’t really venture a guess as to a good one. This would also be in line with what Mark suggested yesterday to pick a short list: let’s make it two, a human-readable protocol and a binary protocol.
 
Once we have consensus on these two formats we can move forward with STIX comfortable that when the time comes to formalize these decisions we’ve already reached some agreement. We can also test these formats out via prototyping to understand if we’re missing anything about the decisions we’ve made (maybe JSON is terrible and will lead to incompatibilities). Finally, as we discuss and improve other parts of the STIX data model we can explore, via prototypes and examples, those aspects in these formats.
 
Thoughts?
 
John
 
From: <cti@lists.oasis-open.org> on behalf of Mark Davidson
Date: Tuesday, September 1, 2015 at 8:23 AM
To: Aharon Chernin, Sean Barnum, "Jordan, Bret", Mark Clancy
Cc: "cti@lists.oasis-open.org"
Subject: RE: [cti] Thoughts on STIX and some of the other threads on this list
 
All,
 
I’d like to offer the distinction between ‘STIX the model’ and ‘STIX the exchange format’. In my mind, ‘STIX the model’ is an abstract representation of objects, their properties, and their relationships to each other. ‘STIX the model’ is not meant to be implemented directly. In contrast, ‘STIX the exchange format’ is focused on enabling interoperability between products/vendors.
 
I think we are all making good points and that categorizing those points as “model” or “exchange format” will help frame discussion so that we can successfully deliberate on them.
 
I’ll pick out this statement as an example, because for me the context is very important:
> Any argument posing that STIX must select a single data representation format for all implementations (whether arguing for JSON or for XML or whatever) is a FALSE argument.
 
In the context of ‘STIX the model’, implementation means JSON or XML (NOT running code). And in that context, I agree completely. I see little value in constraining the way people use the model – if they want to implement the model in XML, by all means.
 
In the context of ‘STIX the exchange format’, I disagree with the statement. For interoperability, products need one and only one format that is widely used. Picking a single format, even if it has some drawbacks, is immensely superior to two formats (unless you want developers to focus time on serialization instead of adding functionality).
 
My hope is to illustrate that depending on the context (STIX the model or STIX the exchange format) a particular point may seem entirely correct or entirely incorrect. Identifying the context/perspective a point is made from will help facilitate discussion.
 
Thank you.
-Mark
 
From: cti@lists.oasis-open.org [mailto:cti@lists.oasis-open.org] On Behalf Of Aharon Chernin
Sent: Monday, August 31, 2015 7:41 PM
To: Barnum, Sean D. <sbarnum@mitre.org>; Jordan, Bret <bret.jordan@bluecoat.com>; Mark Clancy <mclancy@soltra.com>
Cc: cti@lists.oasis-open.org
Subject: Re: [cti] Thoughts on STIX and some of the other threads on this list
 
This reply is not in reply to Sean. He just happened to be the last post in the thread <image001.png> By the way, for the group, what I think we are discussing is the default binding approach for the spec. STIX/TAXII/CybOx can be done in multiple formats. But, what is the format we prefer as default?
 
I am not a fan of XML and I would encourage open debate on alternatives for the next major release. However, I am concerned that people are arguing for format change without actually fixing the complexities that made XML hard in the first place.
 
Let's talk about things that JSON wont fix:
1) Complex logical operations
2) Heavily nested objects 
3) Object Versioning
4) Relationships that go 50 levels deep backwards and forwards.
5) Making it easy just to share a single evil URL with someone. Reduce verbosity ?
6) XPATH in the Marking Structure. Or the marking object in general.
7) Multiple ways to say the same thing
8) Almost every field being optional
 
All of these complexities are built into the language, they are not built into the XML or JSON format.
 
So, how do we focus on the consumer instead?
1) Design a STIX/TAXII ecosystem that doesn't bifurcate itself (ie. use a default binding)
2) Fix the 8 things above
3) Market our products in a way where we teach the value of STIX/TAXII and interoperability to our consumers
4) Build consumer demand for interoperability
5) Stop complaining about format and start building products that talk to each other to fulfill consumers demand for interoperability
5.5) Hide the complexities of the STIX/TAXII to the non-developer consumer
6) Make it "easier" for the super savvy developer consumer to make content (IE: fix the things in the 1-8 bullets above).
 
 
Aharon Chernin
CTO
SOLTRA | An FS-ISAC & DTCC Company
18301 Bermuda green Dr
Tampa, fl 33647
813.470.2173 | achernin@soltra.com

 


From:cti@lists.oasis-open.org <cti@lists.oasis-open.org> on behalf of Barnum, Sean D. <sbarnum@mitre.org>
Sent: Monday, August 31, 2015 6:16 PM
To: Jordan, Bret; Mark Clancy
Cc: cti@lists.oasis-open.org
Subject: Re: [cti] Thoughts on STIX and some of the other threads on this list
 
I would like to once again strongly and hopefully politely make a request.
Can we please stop promulgating and spending cycles on this False Dilemma?
 
How to represent STIX information in the implementation of an information exchange or a repository or an analytic system is NOT an either-or decision.
Since the beginning of the STIX community it has NOT been an either-or decision.
 
When STIX began, when things were very nascent and we knew we didn’t know what we didn’t know,  we as a community decided to use XML Schema to capture and think through ideas because it was widely understood by everyone, gave us an explicit mechanism for defining and validating structure and syntax, and it had a broad ecosystem of tooling available for it. This was in no way a declaration that STIX was only XML or ever would be. To the contrary, it was discussed that we would use it to figure out what we needed to and to test out ideas with real content and once we felt we had reached an appropriate level of maturity and stability that we would abstract out our consensus on structure and semantics to a non-implementation-dependent form and define different implementations against it as appropriate. It was always said that we would likely maintain the XSD implementation as a reference implementation since the work was already done and that there would be a body of implementations already in place using it. However, we also very clearly agreed that other implementations would/could be created as appropriate for particular use cases and technical contexts that required different implementation formats. JSON was explicitly identified as a likely option as were things like protobuf, OWL/RDF, etc. This has always been the plan and is still the plan. It is fundamental to how the charter for the CTI TC was set up and is now only a week or two away from that long-ago envisioned milestone where other implementations like JSON could effectively be defined in such a way that we could have some confidence that they would lead to technology implementations that are actually conformant to the same language standard structure and semantics as any other implementation.
 
It would be inappropriate to assert that XML is the one and only way that STIX should be represented (such an assertion has NEVER been made by the STIX community or the DHS/MITRE teams supporting it).
It would be equally inappropriate to assert that JSON is the one and only way that STIX should be represented.
 
The reality is that different data representation formats exist for a reason. It is not simply that each group decided to "roll their own” rather than reuse what others had developed for their use cases. Such unnecessary redundancy may exist in some cases but for the most part different data representation formats exist because different technical contexts and use cases have different requirements for things like size, speed, rigor, expressivity, flexibility, technical dependencies, etc. And different data representation formats have different advantages and disadvantages related to these sorts of requirements. 
There does not exist any single data representation format that is the “right” answer or even an adequate answer for all technical contexts and use cases. Different situations call for different data representation formats.
Trying to force such a single option, while making natural supporters of that option happy,  would almost certainly drive a good portion of potential adopters away from the table and for many of those who chose to stay at the table and compromise would deliver a reduced capability to whatever format may be appropriate to them. That reduced capability then gets passed on to the users.
 
Recognizing that no single data representation format (syntax and lexicality) is “right” for every context does not mean that we cannot have a single agreed to set of structure and semantics for the domain of information being represented. That is exactly what STIX is intended to be, a single standardized language specifying agreed to structure and semantics for cyber threat information supporting a broad range of technical contexts and use cases. Since the beginning of STIX a foundational principle of STIX has been that STIX is NOT a system or a repository or a sharing program but rather IS a language specifying structure and semantics for cyber threat information that can support any number of systems or repositories or sharing programs implemented by any number of organizations using whatever technologies are appropriate for them. This objective does not require a limitation to one and only one data representation format, and in it fact precludes such a limitation.
 
So, you are correct Bret in saying that “format impacts adoption”. If people feel that they have no option to leverage the format that is most appropriate for their situation then they are less likely to adopt. 
You make a first argument that this is the situation that exists today with people who want to use JSON and I certainly agree with you that there are many people who say that they only want to do STIX in JSON.
You then make a second argument that JSON is the only “right” solution and that nobody wants XML and we should remove support for it.
Unfortunately, the factual statements in the first two sentences of this paragraph that support argument #1 completely invalidate argument #2.
We cannot twist logic to support a preferred option.
I am not sure what I can say about your assertion that nobody uses or cares about XML other than that it is very inaccurate.
While there are many players that support and desire JSON, there are also those who support and desire (or may even be required to use due to regulatory or policy issues) XML. It is in no way a unanimous opinion either way.
And that is only looking at two options. For use cases requiring low latency and high speed, neither JSON nor textual XML are really adequate. For situations like those, options like capn-proto, protobuf, thrift, EXI, etc. are far more appropriate. 
 
For anyone who might read my statements above and interpret me as an XML fan-boy, please know that nothing could be further from the truth. If I were to go write a system today using STIX information, I would choose the appropriate data representation format based on the needs and context of that system. It is very possible that I would choose JSON or some other format based on their advantages and disadvantages. We initially used XSD to model STIX early on as it provided the appropriate advantages to support the sort of exploration we needed to do as a community at that time. That does not mean that it is the best option for someone creating any particular product today. It may be but it may not be.
 
One option that has been discussed to address both the need to support a variety of potential data representation format options and the desire to coalesce toward a single option is to select and specify a Mandatory To Implement (MTI) data representation format within the Conformance section of the STIX language specifications. This would mean that any implementation claiming to be conformant with the STIX language must at least support the MTI data representation format but could also support other additional formats as appropriate to its context. This would enable a minimum bar of interoperability at the format level between implementations but would not prevent people from doing what they need to do other than the MTI. I respectfully suggest that discussion around forming a separate working group to focus on selecting which format should be used for STIX be recentered around investigating and selecting an MTI format rather than an “only” format.
 
Net-net:
1.      Any argument posing that STIX must select a single data representation format for all implementations (whether arguing for JSON or for XML or whatever) is a FALSE argument.
2.      The ecosystem which STIX is and has always been intended to support requires the flexibility to support multiple potential data representation formats
1.      No single data representation format is the “right” choice for all contexts
3.      The important thing for STIX is specifying/modeling standardized structure and semantics of cyber threat information
4.      Any specific data representation format binding and reference implementation must conform to the standardized structure and semantics of the STIX language
5.      We are well along the path to support #2,  #3 & #4 above. We should reach that point in the next couple weeks with the release of the STIX 1.2.1 language specs and the STIX 1.2.1 XML Binding Spec (with accompanying reference implementation)
6.      Any specific data representation format binding and reference implementation should be capable of supporting automated and lossless transformation between itself and other conformant data representation format binding and reference implementations, including round-tripping. If the bindings/implementations are truly conformant to the STIX language/model then this should be possible.
7.      One option is to specify a Mandatory To Implement (MTI) data representation format for STIX-conformant implementations that they must support at a minimum but does not preclude others.
 
I truly hope we can stop spending cycles arguing this False Dilemma and as Mark and Pat suggest focus on the fundamental structure and semantic issues that will make STIX more effective at supporting cyber threat information use cases and the users that use them. ;-)
 
Sorry for the length of this message. I started out attempting brevity but found that some factual explanation was required rather than just making brief statements of hyperbole.
 
Thanks for spending the time to consider this. I wish I had an Easter egg to hide down here for your effort. :-)
 
 
Sean

DTCC DISCLAIMER: This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify us immediately and delete the email and any attachments from your system. The recipient should check this email and any attachments for the presence of viruses.  The company accepts no liability for any damage caused by any virus transmitted by this email.
 

DTCC DISCLAIMER: This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify us immediately and delete the email and any attachments from your system. The recipient should check this email and any attachments for the presence of viruses.  The company accepts no liability for any damage caused by any virus transmitted by this email.

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]