OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

cti-stix message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [cti-stix] [cti-users] MTI Binding


Comments inline

From: "cti-stix@lists.oasis-open.org" on behalf of "Jordan, Bret"
Date: Monday, October 5, 2015 at 1:41 PM
To: "Bush, Jonathan"
Cc: "Barnum, Sean D.", Jane Ginn, John Wunder, "cti-users@lists.oasis-open.org", "cti-stix@lists.oasis-open.org"
Subject: Re: [cti-stix] [cti-users] MTI Binding

I have been reading a lot about JSON-LD, and I get how and why it might be interesting in a website context when you are sharing unknown data back and forth.  Meaning there is no standard for the data you are sharing.  Think user profile between Google, Twitter, Facebook etc.  But, unless I am mistaken, the purpose of STIX is to define a standard for CTI so that we all share the same data.  

Can someone explain why JSON-LD is needed in the CTI context.   I just do not see why anyone that is building an application to use CTI would care since all of the data that will be shared between them is KNOWN and in a standard well known form, aka STIX...  Please help me understand this use case. 

[sean]The only way that STIX can be KNOWN as a standard for cyber threat information (CTI) is for the lexicality, syntax and semantics (i.e. the meaning) of CTI to be explicitly codified. This is the purpose of the ontology/data-model for STIX. It defines the concepts, meanings, properties, relationships, etc. for CTI as a knowledge domain irrespective of any particular technologies chosen to implement any particular use case. In this way the “language” of STIX is portable between any parties, vendors, systems, etc.
When it comes time to implement a specific use case using some particular appropriate technology you will need a way to serialize particular content to a given format (e.g. JSON) such that the implementation can read it in or write it out or exchange it. Any particular chosen serialization format allows the implementation to utilize standards-conformant (to that particular serialization format) tooling to be used to parse in the content. Such a serialization format choice does not however tell you what that content means. To understand what the content means you need a layer of mapping between the end serialization and the overall ontology/data-model that provides meaning to the content. This middle layer involves some formalized constraint on the end serialization format for use within the knowledge domain in question (e.g. XSD or JSON Schema) so that implementations can verify whether or not serialized content is conformant or not. But it also requires some sort of mapping assurance that the formalized schematic constraint itself actually is conformant with the higher level ontology/data-model.
This is the stack we have been working towards:
  • Ontology/data-model: 
Defines the various concepts for the CTI space, as well as their properties, structures and relationships with each other. This also defines how we can deal with uncertainty, diversity and change over time through things like extension points (recognizing that we do not nor ever likely will have the full picture), vocabularies, etc.
  • Binding specifications:
Define a mapping from the Ontology/data-model to a particular representation format (JSON, XML, etc.). These allow a given format to be used by those who support it to represent content according to the ontology/data-model. If these binding specifications are accurate and fully cover the ontology/data-model with explicit mappings then it should be possible to losslessly translate from one bound serialization format to another.
  • Representation format implementation:
An explicit schematic specification (e.g. XSD) for representing CTI content according to the Ontology/data-model as bound by the corresponding binding specification. This will allow implementations that only care about the end serialized content and not the domain meaning of the content to parse and output CTI content in a validatable and interoperable way.
  • Actual instance CTI content expressed in a particular representation format:
Actual instance CTI data.

JSON-LD would basically fit into this stack at the binding specification and representation format levels.
The “context” structure of JSON-LD lets you do the sort of mappings from the ontology/data-model to a particular representation that are the purpose of the binding specifications. In this case the “context” (which can be expressed in a separate reference able file rather than only inline with the content) would capture the binding specification rules for a JSON format implementation and the “context” file(s) itself would form the JSON representation format implementation specification.
At that point instance CTI content could be expressed in JSON with the referenced JSON-LD “context” providing the mechanism for interpreting it.
I have not personally worked directly with JSON-LD nor done any sort of very detailed analysis of its capabilities. It is unclear whether or not JSON-LD has adequate expressivity to fully map our domain or the capability to provide automated validation. It may. It may not. That would be one dimension we would need to explore if we wish to consider JSON-LD as an option (which I would personally support).

In other words, JSON-LD would not simply be something pursued in addition to STIX. It would/could be HOW STIX is defined (KNOWN) for use within a JSON technology stack. It does not replace the need for the data-model and it does not replace the end serialization as pure JSON. Rather it provides a way to explicitly defined the two middle layers of the stack.


Thanks,

Bret



Bret Jordan CISSP
Director of Security Architecture and Standards | Office of the CTO
Blue Coat Systems
PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050
"Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg." 

On Oct 5, 2015, at 11:20, Bush, Jonathan <jbush@dtcc.com> wrote:

I would agree that some of the technologies involved with the Semantic web scare me a little bit (very complex, many seem pretty academic), but at least if we go with a structure that sets us up for this sort of “linked” data thinking now, we leave that door open for the future. 
 
From: cti-stix@lists.oasis-open.org [mailto:cti-stix@lists.oasis-open.org] On Behalf Of Jordan, Bret
Sent: Monday, October 05, 2015 1:11 PM
To: Bush, Jonathan
Cc: Sean D. Barnum; Jane Ginn; Wunder, John A.; cti-users@lists.oasis-open.org; cti-stix@lists.oasis-open.org
Subject: Re: [cti-stix] [cti-users] MTI Binding
 

 

Thanks,
 
Bret
 
 
 
Bret Jordan CISSP
Director of Security Architecture and Standards | Office of the CTO
Blue Coat Systems
PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050
"Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg." 
 
On Oct 5, 2015, at 10:34, Bush, Jonathan <jbush@dtcc.com> wrote:
 
Great information here, thank you Sean.
 
It sounds like we are talking about
Use-Cases -> UML (or OWL/RDF) -> JSON-LD (context portion mapping model to JSON constructs to use)  -> JSON (actual instances of content)
 
I could get behind that.
 
From: Barnum, Sean D. [mailto:sbarnum@mitre.org] 
Sent: Monday, October 05, 2015 10:14 AM
To: Bush, Jonathan; 'Jane Ginn'; Wunder, John A.
Cc: cti-users@lists.oasis-open.org; cti-stix@lists.oasis-open.org
Subject: Re: [cti-stix] Re: [cti-users] MTI Binding
 
I don’t think I would concur with Jane’s characterization that this represents a "significant shift in the level at which we are approaching the problem set”. Rather, I think it fits very well into how we have been approaching the problem but may represent an acceleration along our path. We began the STIX efforts leveraging XSD to specify the language because we as a community did not yet know or agree on what CTI was and XSD provided a good structured mechanism everyone could deal with to collaboratively define and experiment in an explicit way. It was recognized since the beginning that this was only a temporary approach and that we would eventually need a richer more semantic specification with a derived abstraction stack like I described in my earlier post (ontology/data-model, binding specs, implementations, instance data). We have been working along this path as a community gradually separating out and establishing these abstractions for quite a while now. As we have evolved the technology along the planned path we have tried to be mindful not only of the raw technical capabilities of given technologies to support targeted use cases and represent necessary information but also of the community’s ability and willingness to understand, accept and adopt them. I and several others in the community have long posited that a full semantic model (likely using something like OWL/RDF) would likely be the best long term form for the ontology/data-model layer as that is exactly what such technologies are designed for and would provide excellent flexibility but also excellent consistency and potential for automated transformation. To date this has been viewed as a goal to work towards but not something to be pushed too soon as the community may not be familiar enough with it yet. This led to the interim step of leveraging UML to specify the normative model for the language. The UML-based specifications we have today, while not fully semantic, provide us the practical ability to fully instantiate the desired abstraction stack for STIX. I continue to hold the opinion (as do several others in the community like Pat, Paul, Shawn, etc.) that we should continue to evolve towards a full explicit semantic form of specification for the ontology/data-model but it is still unclear how fast that evolution should occur. The plan discussed several times for STIX 2.0 was to stick with the UML + text docs form but begin to work in some semantic modeling snippets as part of discussions on a few of the refactoring issues (e.g. separating out relationships) that are semantic in their roots and likely include a few OWL-style diagrams in the spec text docs. This could then be a good introduction to these approaches for the community and an initial basis for future evolution to fully semantic models.
 
While Jonathan is correct that most XML-based implementations would require some major retooling to support a JSON-based serialization form, I don’t think that JSON-LD inherently brings any further burden that that. To be clear, instance STIX data using a JSON-LD approach would still be “pure JSON”. It is just that the structure of that JSON would be aligned to the ontology/data-model higher in the abstraction stack. To use a simple analogy, think of “pure JSON” not as a human language like English or French but rather as something far lower level such as human cursive handwriting. It can be used to convey all sorts of different semantics and structure (English, French, etc.). The LD/context portions of JSON-LD are what allow someone reading the cursive to recognize whether they are reading English or French and to understand the actual meaning of what is being written. This mapping of meaning from the low-level serialization format to the higher-level ontology/data-model is what the two middle layers of the abstraction stack are all about. These layers are required to be there no matter which approach we take. Without these layers expressed content, regardless of whether it is JSON, XML, protobuf or whatever, would not be interpretable of interoperable. JSON-LD provides one option for defining these layers for a targeted “pure JSON” end serialization for instance data (which is what I believe Bret and several others really want). Another option would be to write the JSON binding spec as some other form of rules (including potentially human language) and then specify the implementation using something like JSON Schema. JSON-LD simply provides an explicit structured way to tackle these two layers.
 
Does that make sense?
Again, anyone knowledgeable on these topics should feel free to point out where they believe my characterizations are incorrect or unclear.
 
Sean
 
From: "cti-stix@lists.oasis-open.org" on behalf of "Bush, Jonathan"
Date: Saturday, October 3, 2015 at 8:15 AM
To: 'Jane Ginn', John Wunder
Cc: "cti-users@lists.oasis-open.org", "cti-stix@lists.oasis-open.org"
Subject: RE: [cti-stix] Re: [cti-users] MTI Binding
 
I would agree.  JSON-LD could be an incredibly powerful way to represent intelligence data, but it represents a fundamental shift that will require a major retooling for most implementations to really take advantage of it.  The good news is that tools (such as Soltra products to be all about “me” for a second) could ease into that implementation by thinning the implementation down to pure JSON at first (I believe, someone correct me if I’m wrong here).  The real question is, will we as implementers get to the point where we really jump all in and represent data using the “LD” portion of the concept?
 
Again, looks promising (after all, if Google and Facebook are using it to represent complex data, why shouldn’t we be paying attention), but do we all know what we would be buying in to?
 
From: cti-stix@lists.oasis-open.org [mailto:cti-stix@lists.oasis-open.org] On Behalf Of Jane Ginn
Sent: Friday, October 02, 2015 8:45 PM
To: Wunder, John A.
Cc: cti-users@lists.oasis-open.org; cti-stix@lists.oasis-open.org
Subject: [cti-stix] Re: [cti-users] MTI Binding
 
Hi All:
While reading through this thread it occurred to me that the JSON-LD suggestion represents a significant shift in the level at which we are approaching the problem set. Cory has long been arguing for us to shift our focus to a semantic model that can serve as a language agnostic approach to solving the CTI sharing problem. Bret has been pushing for JSON as a tool to help us achieve more wide spread adoption. We currently have bindings in XML and Python... but no MTI for moving forward with STIX 2.0.
JSON-LD appears to address several of our issues at a higher level of abstraction.
I'm also intrigued by the potential, from the POV of STIX cosumers, at how PMML can be deployed seamlessly to use wire speed data on attacks for predictive modelling... or at least deploying the myriad of tools for predictive modelling. I expect this is an area of white space in the market that will be picked up by a vendor and developed as an enterprise solution. We just need to get the front end right for the integration.
Jane Ginn
Cyber Threat Intelligence Network 

DTCC DISCLAIMER: This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify us immediately and delete the email and any attachments from your system. The recipient should check this email and any attachments for the presence of viruses.  The company accepts no liability for any damage caused by any virus transmitted by this email.

DTCC DISCLAIMER: This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify us immediately and delete the email and any attachments from your system. The recipient should check this email and any attachments for the presence of viruses.  The company accepts no liability for any damage caused by any virus transmitted by this email.
 

DTCC DISCLAIMER: This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify us immediately and delete the email and any attachments from your system. The recipient should check this email and any attachments for the presence of viruses.  The company accepts no liability for any damage caused by any virus transmitted by this email.



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]