OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

cti-stix message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [cti-users] Re: [cti-stix] [cti-users] MTI Binding


Great, simple (more simple than any of my attempts :-) ) explanation.

Thanks Shawn.

sean

From: <cti-users@lists.oasis-open.org> on behalf of Shawn Riley
Date: Saturday, October 3, 2015 at 1:33 PM
To: "Jordan, Bret"
Cc: Jane Ginn, John Wunder, "cti-users@lists.oasis-open.org", "cti-stix@lists.oasis-open.org"
Subject: Re: [cti-users] Re: [cti-stix] [cti-users] MTI Binding

Some of the folks I've had side conversations with on RDF have found the following information to be of value so I wanted to share it with the wider group. 

Today's Lesson

RDF and XML both attempt to address the problem of enabling different programs and different computers to communicate effectively with each other. In its own way, each takes an important step towards a universal lingua franca for data.

This similar goal of creating a means for any system to communicate with any other is the basis for the confusion. However, there is more to it than that.

A Serialization Format vs. a Data Model

There are, broadly speaking, two problems when parsing a file or data sent over a network. The first is simply being able to read the data in—to translate the series of bytes on disc into logical data. The second is to do something intelligent with that data, such as display it in on the screen. XML solves the first of those problems, and RDF solves the second of them.

This brings us to the major foundational difference between XML and RDF. XML is primarily a serialization format (we'll define this in a little more detail in a minute), while RDF is primarily a data model. From the beginning they are meant to serve two distinct purposes.

RDF vs. XML: An Analogy

Consider the book A Christmas Carol (which, by the way, really is excellent and absolutely deserves to have 100 different film adaptations).

You can purchase it in paperback or in hardcover. You can purchase it as part of a Dickens collection or on its own. I read it by having chunks emailed to me every single day, and you might read it on your Kindle or iPad.

Yet every one of those formats is still somehow the same book. The fact that it can be paper or electricity doesn't fundamentally change the book itself.

For example, let's say that I have two copies of A Christmas Carol: one in braille and one in regular print. Are they the same book?

From the point of view of RDF they absolutely are the same book. The book's meaning is what matters in RDF. The information represented by RDF retains its self-same meaning regardless of its underlying format. If you save RDF file inTurtle or RDF/XML it's still the same information. Braille or print: it's the same book.

From the point of view of XML they are not the same book. A person who cannot read braille cannot consume one of the two. The representation is what matters in the XML world.

In this analogy, RDF represents the informational content of the book; XML is a choice of delivery mechanism (Braille or print). Both parts matter, for sure, but they are two different things.

XML: Meant for Serialization

serialization format is a way to encode information so that when it's passed between machines it can be parsed. In fact, the popularity of XML is due to its addressing the problem of too many file formats. For years, the first thing any programmer would do when creating a new program (for image editing, word processing, data storage…anything at all!) would be to create a way to save its data to disc.

The challenge was that any other program that wanted to read the file would have to special code for reading just that file in. Remember back before Word was the absolute dominant word processor? There were dozens of programs, and not all of them could read each other's files. Even different versions of Microsoft Word couldn't read each other's files!

Aside: Now, the technically astute know that XML itself has multiple serializations. Microsoft Word, for example, serializes XML in a binary format, whereas most XML is serialized as text. For our purposes we're ignoring this nuance since it doesn't affect the overall point.

RDF: A Data Model

RDF, in contrast, is a data model, which is an abstract set of rules for representing information. That, unfortunately, is not a great definition, so let's make it clearer by making some more analogies!

  • As in the book example, the serialization is they physical format of data, while the data model is the way to represent the book's inherent meaning.
  • The serialization is like the grammar of a language, while the data model is the informational content behind words.
  • The serialization is the word "green" spoken aloud in English, while the data model is a way to define the concept of "Green" such that it is unambiguous whether you say "Green" or "Verde" or think about the color of a leaf.

In RDF 101 you saw how RDF is used to define objects and concepts and relationships between them, so hopefully this makes sense. If not, it might be worth re-skimming that lesson.

Simply put, in the RDF world, it doesn't matter how you send the data over the wire. Popular RDF serializations include:

  • Turtle
  • N3
  • RDFa (RDF embedded in HTML)
  • RDF/XML

See the last one? There is a way to represent RDF in XML! That is, if you have a parser that can read XML, it can read RDF/XML! There is no better proof that the two are not competing ideas than the existence of RDF/XML in the first place.

Comparing the XML Technology Stack to the Semantic Web Technology Stack

There is more to a data model than the model itself. How you interact with it (the query language) and how to describe it (the schema language) are incredibly important aspects in terms of practical usage.

The Semantic Web is a set of technologies for representing, storing, and querying data. XML too has a family of related technologies for representing, storing, and querying data.

This lesson is specifically focusing on RDF vs. XML since that is a specific topic that seems to come up very often. Other lessons will compare SPARQL to XQuery, and OWL to XML Schema.

Conclusion

To sum up:

  • XML is concerned with serialization
  • RDF is concerned with informational content

Thus the two technologies, though related, address two distinct problems. The existence of RDF/XML itself proves that they are not meant to compete.

Source: http://www.cambridgesemantics.com/semantic-university/rdf-vs-xml


On Sat, Oct 3, 2015 at 11:33 AM, Jordan, Bret <bret.jordan@bluecoat.com> wrote:
While I no longer consider myself a true academic, one thing is for sure, I am not a professional data modeler.  Debating the values of OWL or UML is not interesting to me, sorry.  What I care about, and the whole reason I started the JSON debate 18+ months ago and have been pushing so hard for JSON, is market adoption.  

If we do not gain wide spread adoption / get across the chasm and go mainstream then it really does not matter how neat and cool our data model is.  Yes there will also be some people / groups that will use it, there are people still using IODEF, OpenIOC, VERIS, CIF, MILE, OTX, etc, but we run the risk of some YACS gaining massive market share and becoming de facto standard.  

My vision for STIX and TAXII is:

1) One-way of doing things

2) Simple to understand and easy to use
a) Reduce the cost of entry for organization to implement, use, and work with STIX and TAXII

3) Very fluid and easy transport of CTI between users, groups, orgs, devices, and products.
a) A transports that allows tools to be written that mimic and enhance the workflow of security analysts today

4) Powerful model that can allow very expressive capture of threats

Thanks,

Bret



Bret Jordan CISSP
Director of Security Architecture and Standards | Office of the CTO
Blue Coat Systems
PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050
"Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg." 

On Oct 2, 2015, at 18:45, Jane Ginn <jane.ginn@gmail.com> wrote:

Hi All:

While reading through this thread it occurred to me that the JSON-LD suggestion represents a significant shift in the level at which we are approaching the problem set. Cory has long been arguing for us to shift our focus to a semantic model that can serve as a language agnostic approach to solving the CTI sharing problem. Bret has been pushing for JSON as a tool to help us achieve more wide spread adoption. We currently have bindings in XML and Python... but no MTI for moving forward with STIX 2.0.

JSON-LD appears to address several of our issues at a higher level of abstraction.

I'm also intrigued by the potential, from the POV of STIX cosumers, at how PMML can be deployed seamlessly to use wire speed data on attacks for predictive modelling... or at least deploying the myriad of tools for predictive modelling. I expect this is an area of white space in the market that will be picked up by a vendor and developed as an enterprise solution. We just need to get the front end right for the integration.

Jane Ginn
Cyber Threat Intelligence Network





[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]