Re: [cti-users] Towards a better understanding of JSON-LD (Was: MTI Bind

I sent a separate e-mail to a few of you with some thoughts on problems to solve and restrictions we’d encounter to implement the JSON-LD approach. I don’t want to exclude anybody, but I also don’t want to flood the list with in-depth discussion on a particular approach.

More broadly, in my mind the next steps to moving forward on the format discussion are:

1. Put a table together with the sort of requirements that we’ve identified on the wiki page and reference how each of these formats (XML Schema, JSON Schema, JSON-LD) meets those requirements. I can mock something up and fill in what I know for XML Schema and JSON Schema.

2. Have someone familiar with JSON-LD put together a sample context and JSON schema to see how much up-front work it will be and whether we can get to the point where pure JSON developers just see JSON. I think Cory will be stepping up here, though it may take a few weeks (which is totally reasonable).

3. At some point (maybe once we go to actually formalize this decision in a work product) we need to hear from more voices. This decision can’t be made based on 10 of us arguing about whether people will want RDF: we need to hear from those people building capabilities (analysts hoping to use native RDF capabilities, vendors building products) to see where we really stand as a community. My proposal on this one is to wait, minimally, until #1 and #2 are done and then consider more actively soliciting opinions from the TC (google form, poll, etc).

John

[moved this to the CTI TC list]

On Oct 10, 2015, at 3:59 PM, Shawn Riley <shawn.p.riley@GMAIL.COM> wrote:
Hi folks-

There was a very informative webinar video from August 2015 on JSON-LD that should be of interest to those looking to better understand JSON-LD and how it can be used with existing JSON systems.

http://www.dataversity.net/smartdata-webinar-json-ld/

Slides from webinar also available: http://www.dataversity.net/smartdata-webinar-slides-json-ld/

Really good information from Brian Sletten.

Shawn
On Fri, Oct 9, 2015 at 6:23 PM, Paul Patrick <paul.patrick.sr@gmail.com> wrote:
+1. Mixing would be very bad

Sent from my iPhone
On Oct 9, 2015, at 5:46 PM, Cory Casanave <cory-c@modeldriven.com> wrote:
Re: Decisions around the format of STIX 2.0 don't automatically propagate to the CybOX level - those decisions would have to be made in the CybOX TC.

Yuck.

IMHO: Having mixed formats would be worse than any one choice, this should be a CTI wide decision. The choice of a wire format for CTI does not preclude the same information (or a subset of it) in other formats, as we have discussed.

From: Jason Keirstead [mailto:Jason.Keirstead@ca.ibm.com]
Sent: Friday, October 09, 2015 3:56 PM
To: Cory Casanave
Cc: Barnum, Sean D.; Wunder, John A.; Jerome Athias; Jordan, Bret; Shawn Riley; Jacobsen, Jasen W.; cti-users@lists.oasis-open.org
Subject: RE: [cti-users] Towards a better understanding of JSON-LD (Was: MTI Binding)
RE "I think the idea of putting current XML inside of RDF or JSON would be a mess and add a lot of complexity and constraints, go there with caution."
 
This is going to be an issue no matter what format is decided upon, unless we abandon the effort and keep the existing XML format as-is - because of CybOX. 
 
Decisions around the format of STIX 2.0 don't automatically propagate to the CybOX level - those decisions would have to be made in the CybOX TC. And since CybOX has many more consumers than simply STIX, it could (and probably will) become an even more difficult task to change formats. Therefore I think it has to be assumed that no matter what format STIX 2.0 ends up in, that format needs to assume it will have embedded XML
 
The alternative would be for STIX to abandon CybOX as the CTI data matching method for STIX, and create a new language. That would be an undertaking.
 
Sent from IBM Verse
Cory Casanave --- RE: [cti-users] Towards a better understanding of JSON-LD (Was: MTI Binding) ---

From:

"Cory Casanave" <cory-c@modeldriven.com>

To:

"Barnum, Sean D." <sbarnum@mitre.org>, "Wunder, John A." <jwunder@mitre.org>, "Jerome Athias" <athiasjerome@GMAIL.COM>

Cc:

"Jordan, Bret" <bret.jordan@bluecoat.com>, "Shawn Riley" <shawn.p.riley@gmail.com>, "Jacobsen, Jasen W." <jasenj1@mitre.org>, cti-users@lists.oasis-open.org

Date:

Fri, Oct 9, 2015 3:19 PM

Subject:

RE: [cti-users] Towards a better understanding of JSON-LD (Was: MTI Binding)

Re: How will the JSON (or RDF) serialization work with the XML-based extensions we use now?

In terms of RDF, including the JSON-LD serialization, extensibility is a strong point. Both types of “things” as well as types of properties can have “subtypes & supertypes”. Since a subtype (or sub property) can have any number of parents (multiple inheritance) we don’t get into the issues we find in XML Schema relative to a single tree of specialization, you can specialize across and combine multiple viewpoints. Being able to specialize the “verbs” as well as the “nouns” across multiple viewpoints has proven very useful. If necessary an “instance” can also have multiple types (multiple classification), so perhaps the same individual can be a “Person” and a “Victim”.

Remember that everything in RDF has a URI – this is true of “types” as well as “instances”. Therefor a “controlled vocabulary” is just another set of RDF URIs that you reference, and these can be added to and extended as new URIs are defined. A sub-community can also define their own, if your policies allow.

A lot of the complexity of extensible XML schema comes from inventing mechanisms to solve these 3 problems, which are just a given in RDF. Note that UML fully supports the above concepts directly so can model the RDF semantics without bending the paradigm.

Data markings are just another OWL vocabulary, there are a few to choose from, including one specific to provenance. I have not used this vocabulary but respect the leaders very much.

Field level referencing is done with a URI. Queries are done with SPARQL.

I think the idea of putting current XML inside of RDF or JSON would be a mess and add a lot of complexity and constraints, go there with caution.

While I think RDF/JSON-LD would make a great serialization – I would not use it for my high level models. All schema have limits (as required by their technology stack – in the case of OWL it is tableau reasoners & triples) and we will want to support other technologies, so I would still use UML for the logical model and produce the RDF. That way we don’t bind our concepts into a single runtime stack which we know will change over time.

-Cory

From: cti-users@lists.oasis-open.org [mailto:cti-users@lists.oasis-open.org] On Behalf Of Barnum, Sean D.
Sent: Friday, October 09, 2015 12:52 PM
To: Wunder, John A.; Jerome Athias
Cc: Jordan, Bret; Shawn Riley; Jacobsen, Jasen W.; cti-users@lists.oasis-open.org
Subject: Re: [cti-users] Towards a better understanding of JSON-LD (Was: MTI Binding)

>How will the JSON (or RDF) serialization work with the XML-based extensions we use now?

[sean]I would generalize this a bit to something like "How will the JSON (or RDF) serialization support the concept of extension points (the things implemented in the XSD using xsi:type) for areas where there is known variation or lack of current community consensus?"

The issue is not how would JSON support xsi:type. It is how it would support extensibility. The current spec (UML) has already abstracted this issue away from a question specifically about xsi:type. That is just the way it would typically be implemented in an XML serialization.

>How will JSON (and RDF) handle digital signatures?

[sean] I would add things like:

How will JSON (or RDF) handle data markings?

How will JSON (or RDF) handle controlled vocabularies?

How will JSON (or RDF) handle Type constraining/conversion for properties?

How will JSON (or RDF) handle field-level referencing and querying?

I make no assertions that these are not possible in JSON but I have yet to see anyone demonstrate how they would be done. I think we need to avoid assumptions and make sure that any given serialization choice can effectively support all the capabilities needed.

I also don’t think this is necessarily a complete list of the questions we need to answer. Let’s work to flesh out this list and answer these questions for any serialization formats we consider for MTI, not just for JSON.

sean

On 10/9/15, 12:32 PM, "cti-users@lists.oasis-open.org on behalf of Wunder, John A." <cti-users@lists.oasis-open.org on behalf of jwunder@mitre.org> wrote:

These are great questions!

How will the JSON (or RDF) serialization work with the XML-based extensions we use now?

How will JSON (and RDF) handle digital signatures?

In terms of time to produce the schemas, it took me about an hour to put together the examples (schema + content) that I posted a few days ago. They were against a small part of STIX. I would estimate it’s like 60 hours to produce the STIX/CybOX schemas plus (since this is a manual mapping) 60 hours to review. Assuming we develop an RDF-based high-level model you could almost certainly automate that, though. And of course if you go with RDF serialization this process is essentially free.

John

On Oct 9, 2015, at 12:21 PM, Jerome Athias <athiasjerome@GMAIL.COM> wrote:

do someone could provide a cost/time estimation regarding the

translation of STIX (and the other used schemas, including extensions,

e.g. CybOX, MAEC, or CVE, CAPEC, CIQ...) into JSON schemas?

would JSON could easily transport XML chunks?

is there a JSON mechanism to cover a requirement related to

signature/encryption for data objects somehow like XMLDSIG/XAdES?

2015-10-09 18:58 GMT+03:00 Jordan, Bret <bret.jordan@bluecoat.com>:

There needs to be one and only one on-the-wire serialization for the default

case which is probably 90+% maybe as high as 95+% of the market.  There will

also need to be an option for an additional on-the-wire serialization to

support super high bandwidth conditions where something like Protobuf or

Cap-n-Proto would be the logical choice.  If we do NOT have a default

serialization that everyone can just use and it just works (think DLNA for

security tools) then all of this is for not and we might as well go back to

our day jobs.

To be clear:

1) We need a high level format like UML to represent the data model.  I

personally like UML as it is something that data modelers can live with and

developers / implementers can still use and understand.  It also does not

require massively expensive modeling tools to look at or understand.

2) We need a very expressive and yet intuitive data model that is easy to

understand but allows rich documentation of threats, their relationships,

and sightings.

3) I personally do not believe we need a strict serialization binding from

the model to the on-the-wire format.  A binding between UML and

JSON+JSONSchema is where we need to go.

4) My proposal is and has been:  UML Data Model with JSON+JSONSchema

serialization with the option of Protobuf/Cap-n-Proto as a secondary

serialization.

Thanks,

Bret

Bret Jordan CISSP

Director of Security Architecture and Standards | Office of the CTO

Blue Coat Systems

PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050

"Without cryptography vihv vivc ce xhrnrw, however, the only thing that can

not be unscrambled is an egg."

On Oct 9, 2015, at 09:30, Shawn Riley <shawn.p.riley@GMAIL.COM> wrote:

I'm confused then. If the community is perfectly happy with an RDF/OWL w/UML

data model then that is all that is needed to use the RDF serializations. It

seems then the argument is creating additional JSON / JSONSchema on the wire

format in addition to the RDF serializations?  Or is the community saying we

are ok with having an RDF/OWL w/ UML data model but you will prohibit the

community from using any of the existing RDF serializations designed to be

used with the data model?

On Fri, Oct 9, 2015 at 11:14 AM, Wunder, John A. <jwunder@mitre.org> wrote:

I haven’t seen anybody suggest not having an abstract data model (either

via RDF, UML, or something else). Bret in particular has been careful to

maintain that we will base any serialization on a high-level model.

The question we’re tackling now is whether the on-the-wire MTI format

should be something tied directly to RDF, like JSON-LD, or something that's

indirectly tied to the high-level model via a binding specification, like

JSON with JSONSchema. Both approaches allow for an RDF-based analysis, it’s

just a question of whether an RDF-based serialization format is the best

approach for sharing data between tools when not all of them will want to do

RDF.

FWIW I’m waiting to see what Cory’s examples look like.

John

On Oct 9, 2015, at 10:55 AM, Shawn Riley <shawn.p.riley@GMAIL.COM> wrote:

I just don't see why some here are moving away from the original plan of

moving from XML to an abstract data model like RDF. We had face to face

discussions on the topic and it's been discussed repeatedly since STIX

launched. The whole reason some have been promoting STIX internationally and

across the community was because this was the future direction. I certainly

don't want to throw away the last 4 years of work on CTI in RDF and the

significant advancement in analytic tradecraft it brings. I don’t see why

this should be positioned as an either-or decision. The desires of those

wanting simple JSON serializations should be fully possible within an

RDF-based modeling approach while still enabling us to support moving

forward the state of the practice for cyber threat analysts. Please help me

understand why after more than 4 years of discussing this transition from

XML to an RDF-based modeling approach that we now have people pushing to

move the CTI effort in another direction?

On Thu, Oct 8, 2015 at 12:14 PM, Jacobsen, Jasen W. <jasenj1@mitre.org>

wrote:

Note that the JSON they provide is JSON-LD.

http://docs.publishmydata.com/developers/105_resource_formats.html

And they provide a _javascript_ example of accessing the JSON-LD:

http://docs.publishmydata.com/developers/121_example_javascript_filtered_resources.html

Good resource. Thanks for sharing.

- Jasen.

From: <cti-users@lists.oasis-open.org> on behalf of Shawn Riley

<shawn.p.riley@gmail.com>

Date: Thursday, October 8, 2015 at 11:28 AM

To: "cti-users@lists.oasis-open.org" <cti-users@lists.oasis-open.org>

Subject: Re: [cti-users] Towards a better understanding of JSON-LD (Was:

MTI Binding)

I wanted to share a link (below) to a blog which talks about RDF

serialization formats and while this isn't STIX specific it does use real

world data from http://opendatacommunities.org/ which is the UK Department

for Communities and Local Government's official Linked Open Data website. As

I'm sure everyone is aware both the USA and UK governments have been

champions of RDF for several years now and continue to push for open data to

made available in RDF.

http://blog.swirrl.com/articles/rdf-serialisation-formats/   <--NOTE this

is from 2012 before the JSON-LD development but it should help those looking

for more RDF data then the US Government's 7000+ RDF open data sets.

cti message