OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

cti-stix message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [cti-users] [cti-stix] [cti-users] MTI Binding


I just talked to Sean on the phone (as you may have noticed we’re coming from different sides here) and wanted to bring up a few points here:

- We all agree there will be a high-level model
- We all agree that there will be at least one “binding” of that model to a format on the wire

There’s middle layer there: how do you map fields between that high-level model and the format on the wire? Let’s imagine three approaches:

a) The JSON-LD / RDF approach has explicit linkages directly between the high-level model and the on-the-wire format.
b) JSON-Schema would be similar to what we’ve done with XML schema…defining some rules for how that works and a binding specification that describes it.
c) “Raw” JSON, without a schema, does not have that binding. That said, I don’t think Bret (or myself, or the other people in favor of JSON) are saying we should do schema-less JSON. We’re saying we should develop a JSON-Schema binding similar to how we have an XML-Schema binding now.

Looking at the requirements:
#1. Satisfied by "a" and “b"
#2. This requirement specifies an approach (“exchange data reference its definitions”) plus a requirement (“every tag may be deterministically be bound to its definition”). JSON-schema would not require that exchange data reference definitions (IMO not a requirement) but does allow you to deterministically bind data to its definition. JSON-LD does both, XML schema (depending on whether you use schemaLocation) can do either one or both. Raw JSON does not do this.
#3. I’m not sure of the basis for saying this doesn’t work in STIX 1.2. STIX 1.2 tools are doing this now, so I don’t think you can assert that it doesn’t meet this requirement. STIX defines @id and @idref and how they should work to meet this requirement, so although it doesn’t use something like XLink it does work. Similarly, you could do the same think in JSONSchema. Heck, you could even use the same exact field names and URIs as you do in JSON-LD. So, both meet this requirement.
#4. I think this is a balancing act…leveraging every single existing standard might not always be the best approach. If existing standards are overly complex (maybe they do more than we need) then we should invent our own simpler way of doing things. JSON-LD is more “standard”, but if it adds a lot of complexity that makes STIX harder to use then it might not be worth it.
#5. I agree with this as a general statement and I would say it really comes down more to model definition than specific bindings. That said, to me, JSON-LD seems to make it more complex to do simple things (extra fields, need to understand JSON-LD in addition to JSON, etc), but perhaps as we do more complex things that complexity becomes useful.

Given this, I would like to cut out choice (c). Let’s agree that we need to either leverage JSON-LD or define JSONSchemas.

I looked into JSON-LD a bit and see the following advantages (as compared to JSON-Schema / JSON):

- Standard approach to ID, Type, and links might be nice to leverage
- If we get to an approach where things are defined in RDF it’s easier to generate JSON-LD-compatible schemas than JSONSchema schemas
- Perhaps better integration with other data models, though those data models are of varying quality (many use XSD datatypes, which would suck to do in JSON)

I also see some disadvantages:

- Larger conceptual burden to understand than pure JSON…JSON people can understand JSONSchema because they understand JSON, but hunting down and understanding the JSON-LD schemas from schema.org has been challenging for me.
- I’m concerned about how open the types seem to be and how it relies on producers telling consumers what to expect. As a consumer, I want to know what I’m getting and I worry that some of the things you can do in JSON-LD make that difficult. Essentially, I worry that consumers will have to do a lot of work “revalidating” data because JSON-LD lets you be so open about type definitions and extensions. For example, I’m a STIX consumer and get some data: how do I validate that it actually is vanilla STIX and not some weird extension that the producer created? Producers can even send type definitions inline, which means they could potentially redefine things at that level. This flexibility could be a bad thing for scenarios where we want to lock down the exchanges.

I think a few things would help:
- Examples of STIX content encoded in JSON-LD. I was actually going to put this together but couldn’t figure out how to define the schemas so gave up.
- Examples of another community that collaborated on defining these schemas and now uses it for exchange. I saw a lot of examples of one-offs (Google using it for Gmail, where Google is the only consumer) but not a lot where you have validated P2P exchanges based on schemas the community agreed to.

I hope this helps focus the discussion.

John

On Oct 6, 2015, at 1:37 PM, Jordan, Bret <bret.jordan@BLUECOAT.COM> wrote:

I will completely and utterly disagree with these requirements.  See Jason's post for further understanding.  And for the record I completely agree with Jason from IBM.   


Thanks,

Bret



Bret Jordan CISSP
Director of Security Architecture and Standards | Office of the CTO
Blue Coat Systems
PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050
"Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg." 

On Oct 6, 2015, at 11:21, Cory Casanave <cory-c@modeldriven.com> wrote:

Yes.
This is the reason for requirement: 2: That all exchange data reference its definition(s) such that every tag used may be deterministically bound to its definition.
 
Which current STIX-XML satisfies.
 
 
From: Barnum, Sean D. [mailto:sbarnum@mitre.org] 
Sent: Tuesday, October 06, 2015 12:53 PM
To: Cory Casanave; Jason Keirstead
Cc: Terry MacDonald; Jordan, Bret; cti-users@lists.oasis-open.org; cti-stix@lists.oasis-open.org; Wunder, John A.
Subject: Re: [cti-users] Re: [cti-stix] [cti-users] MTI Binding
 
I would agree with Cory’s characterizations and assertions here.
Jason, I think you may have misinterpreted what Cory was trying to say.
He was not saying that given fields have multiple meanings. He was saying that differing use cases focused on different purposes may leverage different sets of fields and there will likely be overlap between the fields leveraged by different use cases. And that different use cases may care about a given field for different reasons and do different things with its content. That does not mean that the field has multiple meanings just that its one meaning may serve multiple purposes.
 
Cory, please feel free to point out if I am mischaracterizing your intent.
 
sean
 
 
From: Cory Casanave
Date: Tuesday, October 6, 2015 at 10:37 AM
To: Jason Keirstead
Cc: "Barnum, Sean D.", Terry MacDonald, "Jordan, Bret", "cti-users@lists.oasis-open.org", "cti-stix@lists.oasis-open.org", John Wunder
Subject: RE: [cti-users] Re: [cti-stix] [cti-users] MTI Binding
 
Jason,
Re: This premise is untrue. Or at least, at the release of STIX 2.0, this has to be untrue - otherwise we have fundamentally failed in creating a data interchange standard. And I believe that this incongruency is at the heart of this whole discussion.
 
The count of terms is easily verified, so I assume you think this is untrue: may be used for very different use cases that use different viewpoints of the data with different root structures
 
Consider STIX may be used by one application to produce or consume a list of suspect IP addresses, and is hard-coded to that purpose and structure. It has been independently suggested that all uses of STIX will be coded.
 
Another is coded for “Mitigation Strategies - Coordinated Action Plans - Courses of Action - Understanding of Achievable Mitigation Effects”.
 
Other than having a common STIX envelope, I would consider these different viewpoints of the data with different root structures.
 
You could identify dozens of such essentially different exchanges. I’m not suggesting this as a failure, only a reality of the domain and scope. Of course, there are different approaches to handling such diversity – which is part of this conversation.
 
-Cory
 
From: Jason Keirstead [mailto:Jason.Keirstead@ca.ibm.com] 
Sent: Tuesday, October 06, 2015 8:26 AM
To: Cory Casanave
Cc: Barnum, Sean D.; Terry MacDonald; Jordan, Bret; cti-users@lists.oasis-open.org; cti-stix@lists.oasis-open.org; Wunder, John A.
Subject: RE: [cti-users] Re: [cti-stix] [cti-users] MTI Binding
 

"STIX is several thousand terms and may be used for very different use cases that use different viewpoints of the data with different root structures."

This premise is untrue. Or at least, at the release of STIX 2.0, this has to be untrue - otherwise we have fundamentally failed in creating a data interchange standard. And I believe that this incongruency is at the heart of this whole discussion.

The whole point of data interchange standards is to explicitly avoid this premise. It is so that when I create a message such as "<foofarah><name>foo</name><id>bar</id></foofarah>", **I can send that message without any other context to any recipient on the planet** - and the recipient will be able to understand it, because they do not have to guess as to what "name", or "id" mean - because they know that I am following the "Fooferah 1.0" standard, which explicitly defines what is present in those fields.


-
Jason Keirstead
Product Architect, Security Intelligence, IBM Security Systems
www.ibm.com/security | www.securityintelligence.com

Without data, all you are is just another person with an opinion - Unknown 


<image001.gif>
Cory Casanave ---2015/10/05 09:04:26 PM---Sean, I very much agree. A lot of the “its simple” view of JSON or even early XML is based on its us

From: Cory Casanave <cory-c@modeldriven.com>
To: "Barnum, Sean D." <sbarnum@mitre.org>, Terry MacDonald <terry.macdonald@gmail.com>, "Jordan, Bret" <bret.jordan@bluecoat.com>
Cc: "cti-users@lists.oasis-open.org" <cti-users@lists.oasis-open.org>, "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, "Wunder, John A." <jwunder@mitre.org>
Date: 2015/10/05 09:04 PM
Subject: RE: [cti-users] Re: [cti-stix] [cti-users] MTI Binding
Sent by: <cti-users@lists.oasis-open.org>





Sean,
I very much agree. A lot of the “its simple” view of JSON or even early XML is based on its use for single and highly structured interactions between endpoints controlled by the same authority (My server talking to my android application).

STIX is several thousand terms and may be used for very different use cases that use different viewpoints of the data with different root structures. On top of this is the need for extensibility and flexibility. This is simply the reality of the domain and the scope of STIX. The bad news is that regardless of the serialization format, schema language, model, language, etc. It is somewhat complex – that is the real and necessary complexity. So I am concerned that the “Pure JSON will be simple” view will end in some disappointment. Note that the same concerns of complexity are levied against NIEM, another large XML schema based data sharing standard.

The good news is we can make it BETTER and as simple as is practical! When some of these requirements are folded into XML schema, it adds complexity – so perhaps some of these other choices REDUCE complexity even if they require some new learning. Where we can add semantic precision software can handle some of the load. If we have a way to define fine-tuned “profiles”, these may be much simpler for their more limited purpose. We can also make the models easier to understand for us humans with graphical models linked to semantic definitions.

I am copying the following list from Shawn Riley to show the variety of information formats and viewpoints that we are trying to fit together under one, many faceted, schema:

Below is some of the typical cybersecurity data and information users/analysts/scientists have to organize into some type of body of knowledge so they understand their cybersecurity ecosystem. If the technology can’t understanding the meaning of the data then it’s the humans who have to understand it and “connect the dots”. 

Configuration/Anomaly Reporting - Infrastructure Information - Risk Posture - Anomalies

Knowledge of Threat Actors - Threat Actor Infrastructure - Threat Actor Personas - Collected Threat Actor Indicators - Threat Actor Attribution - Trend Analysis - Victim Information

Incident Awareness - Incident Information - Incident Data - Infrastructure Impact and Effects - Investigations/cases - Alerting Indicators - Victim Information

Indications and Warnings - Events and Alerts - Tipping and Cueing - Warnings - Impact assessments - Potential Indicators

Vulnerability Knowledge - Vulnerabilities - Exploits - Potential Victim Information

Mitigation Strategies - Coordinated Action Plans - Courses of Action - Understanding of Achievable Mitigation Effects

Mitigation Actions and Responses - Computer Network Defense Situational Awareness - Action Tasking and Status - Effectiveness Reporting - After Action Reporting and Lessons Learned


From:cti-stix@lists.oasis-open.org [mailto:cti-stix@lists.oasis-open.org] On Behalf Of Barnum, Sean D.
Sent:
 Monday, October 05, 2015 9:18 AM
To:
 Terry MacDonald; Jordan, Bret
Cc:
 cti-users@lists.oasis-open.org; cti-stix@lists.oasis-open.org; Wunder, John A.
Subject:
 [cti-stix] Re: [cti-users] Re: [cti-stix] [cti-users] MTI Binding


I think that using these simple idioms would be great for folks to see roughly what the different forms look like but I do not think they would be sufficient for comparing size and complexity as a whole.
These are VERY simple example structures. More complex examples would likely differ from these simple ones in how each representation tackle size and complexity.

sean

From: <cti-users@lists.oasis-open.org> on behalf of Terry MacDonald
Date: 
Friday, October 2, 2015 at 5:51 PM
To: 
"Jordan, Bret"
Cc: 
"
cti-users@lists.oasis-open.org", "cti-stix@lists.oasis-open.org", John Wunder
Subject: 
Re: [cti-users] Re: [cti-stix] [cti-users] MTI Binding

+1. Is a nice idea as we can see a size and complexity comparison. Is there any chance each person can document the process that the generation took? I'm thinking it could be useful to see how complicated the toolchain for developing each type of output is.

Cheers
Terry MacDonald

On 3 Oct 2015 6:33 am, "Jordan, Bret" <bret.jordan@bluecoat.com> wrote:
I think this is a great idea.. 

Thanks,

Bret



Bret Jordan CISSP
Director of Security Architecture and Standards | Office of the CTO
Blue Coat Systems
PGP Fingerprint: 63B4 FC53 680A 6B7D 1447 F2C0 74F8 ACAE 7415 0050
"Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg."

On Oct 2, 2015, at 14:08, Wunder, John A. <jwunder@mitre.org> wrote:

How about we take two of the idioms on the stixproject.github.io site?

- http://stixproject.github.io/documentation/idioms/c2-indicator/
- http://stixproject.github.io/documentation/idioms/simple-incident/

Thanks for helping out. I think it would be nice to see these as:

- Current STIX XML (Done already)
- Simplified XML (TBD, maybe if the JSON one is quick I’ll do this too)
- JSON/JSON-Schema (Wunder)
- JSON-LD (Casanave)
- Any others people are interest (PMML, Thrift, ProtoBuf, etc)

John
On Oct 2, 2015, at 11:50 AM, Cory Casanave <cory-c@modeldriven.com> wrote:

Re: Examples.
Pick your examples, I can help out. Would prefer to baseline off of the same schema subset & example data, current STIX is fine to define the examples. I suggest at least one that is very simple “pure hierarchical data” and at least one with some related entities.
-Cory




[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]