OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

cti-stix message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [cti-stix] [cti-users] MTI Binding


I get what you are saying...  Let me rephrase and ask the question differently.....

1) Yes we need to have a specification for what STIX is.  You call it a bunch of terms dealing with lexicality, syntax, semantics, ontology, & data model.  I agree we need this, regardless of what it is called...  Right now it is in UML, maybe in the future it will be OWL, maybe further out it will be StandAndSit Ontology Markup, whatever.  The fact of the matter is we need this and it needs to be rock solid, well understood, and easy to comprehend.  

2) The specification will be represented in a serialization format that products, devices, and software will actually use.  Hopefully that is JSON, today it is XML.  

3) When a user gets a blob of JSON based STIX 2.0 data, and uses some tooling to parse it / dump it in to a MongoDB, they will know what it is that they parsing as the content is fixed by the specification / data model / ontology etc..  If a developer does not understand what say an InformationSourceType is, they can look it up in the UML or OWL model.  Once they have it figured out, meaning, once they know what an InformationSourceType is, they can work with it.  

JSON-LD and other things like it, are for telling remote software how to guess at what the data actually is because there is no standard form.  Example, say you have two organizations that have data they want to share.....

// Twitter as an example
{
name: "Barney",
color : "Purple"
}


vs
// Facebook as an example
{
name: "bny00989",
color: "Purple",
size: "3 meters"
}


Now the "name" value in both blobs is related but does not contain the same information.  One appears to contain a real name while the other appears to contain a username.  What JSON-LD and other things like this allow you to do is put some context around the "name" so that some software can understand it correctly.  The reason this is needed is because there is no standard to define what "name" should be.  But we do not have that problem in STIX because we have a specification that tells you what everything is.  

So in JSON-LD you would have something like:

{
  "@context": "http://schema.org/",
  "@type": "Person",
  "name": "Barney",
  "jobTitle": "Professor",
  "telephone": "(425) 123-4567",
  "url": "http://www.janedoe.com",
  "size": "3 meters"
}


Thus you have the ability to assign a TYPE and CONTEXT, aka a Schema location for the blob...  This way software can hopefully better understand arbitrary data that is coming across.  Now this seems like it would be cool to do with STIX.  But given that we HAVE a data model and a specification that everyone will be using, why do we need to tell software what something is when it is already defined in the specification.  

I could see if we were trying to build a model that allowed anyone to create any type of schema blob, then YES, this would get us there. Then STIX could become a standard for how you define other CTI standards. Then everyone can create their own STIX Lite and publish their own Schema and use JSON-LD to tell someone else what their schema is and how to interpret it.  I could see JSON-LD being used to "overload" the "InformationSourceType" and say that for example Company FOO does not like the OASIS version of "InformationSourceType" so they have defined a new "InformationSourceType" and in order to understand their data blob you need to go to this schema location programmatically and figure it out.  

If we are wanting to allow organizations to overload our STIX specification then yes, lets to JSON-LD.  Otherwise the data that will be shared will be known because it will be in the specification.  


Thanks,

Bret



Bret Jordan CISSP
Director of Security Architecture and Standards | Office of the CTO
Blue Coat Systems
PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050
"Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg." 

On Oct 5, 2015, at 15:07, Barnum, Sean D. <sbarnum@mitre.org> wrote:

Comments inline

From: "cti-stix@lists.oasis-open.org" on behalf of "Jordan, Bret"
Date: Monday, October 5, 2015 at 1:41 PM
To: "Bush, Jonathan"
Cc: "Barnum, Sean D.", Jane Ginn, John Wunder, "cti-users@lists.oasis-open.org", "cti-stix@lists.oasis-open.org"
Subject: Re: [cti-stix] [cti-users] MTI Binding

I have been reading a lot about JSON-LD, and I get how and why it might be interesting in a website context when you are sharing unknown data back and forth.  Meaning there is no standard for the data you are sharing.  Think user profile between Google, Twitter, Facebook etc.  But, unless I am mistaken, the purpose of STIX is to define a standard for CTI so that we all share the same data.  

Can someone explain why JSON-LD is needed in the CTI context.   I just do not see why anyone that is building an application to use CTI would care since all of the data that will be shared between them is KNOWN and in a standard well known form, aka STIX...  Please help me understand this use case. 

[sean]The only way that STIX can be KNOWN as a standard for cyber threat information (CTI) is for the lexicality, syntax and semantics (i.e. the meaning) of CTI to be explicitly codified. This is the purpose of the ontology/data-model for STIX. It defines the concepts, meanings, properties, relationships, etc. for CTI as a knowledge domain irrespective of any particular technologies chosen to implement any particular use case. In this way the “language” of STIX is portable between any parties, vendors, systems, etc.
When it comes time to implement a specific use case using some particular appropriate technology you will need a way to serialize particular content to a given format (e.g. JSON) such that the implementation can read it in or write it out or exchange it. Any particular chosen serialization format allows the implementation to utilize standards-conformant (to that particular serialization format) tooling to be used to parse in the content. Such a serialization format choice does not however tell you what that content means. To understand what the content means you need a layer of mapping between the end serialization and the overall ontology/data-model that provides meaning to the content. This middle layer involves some formalized constraint on the end serialization format for use within the knowledge domain in question (e.g. XSD or JSON Schema) so that implementations can verify whether or not serialized content is conformant or not. But it also requires some sort of mapping assurance that the formalized schematic constraint itself actually is conformant with the higher level ontology/data-model.
This is the stack we have been working towards:
  • Ontology/data-model: 
Defines the various concepts for the CTI space, as well as their properties, structures and relationships with each other. This also defines how we can deal with uncertainty, diversity and change over time through things like extension points (recognizing that we do not nor ever likely will have the full picture), vocabularies, etc.
  • Binding specifications:
Define a mapping from the Ontology/data-model to a particular representation format (JSON, XML, etc.). These allow a given format to be used by those who support it to represent content according to the ontology/data-model. If these binding specifications are accurate and fully cover the ontology/data-model with explicit mappings then it should be possible to losslessly translate from one bound serialization format to another.
  • Representation format implementation:
An explicit schematic specification (e.g. XSD) for representing CTI content according to the Ontology/data-model as bound by the corresponding binding specification. This will allow implementations that only care about the end serialized content and not the domain meaning of the content to parse and output CTI content in a validatable and interoperable way.
  • Actual instance CTI content expressed in a particular representation format:
Actual instance CTI data.

JSON-LD would basically fit into this stack at the binding specification and representation format levels.
The “context” structure of JSON-LD lets you do the sort of mappings from the ontology/data-model to a particular representation that are the purpose of the binding specifications. In this case the “context” (which can be expressed in a separate reference able file rather than only inline with the content) would capture the binding specification rules for a JSON format implementation and the “context” file(s) itself would form the JSON representation format implementation specification.
At that point instance CTI content could be expressed in JSON with the referenced JSON-LD “context” providing the mechanism for interpreting it.
I have not personally worked directly with JSON-LD nor done any sort of very detailed analysis of its capabilities. It is unclear whether or not JSON-LD has adequate expressivity to fully map our domain or the capability to provide automated validation. It may. It may not. That would be one dimension we would need to explore if we wish to consider JSON-LD as an option (which I would personally support).

In other words, JSON-LD would not simply be something pursued in addition to STIX. It would/could be HOW STIX is defined (KNOWN) for use within a JSON technology stack. It does not replace the need for the data-model and it does not replace the end serialization as pure JSON. Rather it provides a way to explicitly defined the two middle layers of the stack.


Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]