Re: [cti-users] Re: Research Paper

Cory thanks for the prospective.

My comments below are based on building OWL/RDF-based ontologies leveraging the concepts found in STIX/MAEC/CybOX/et al or similar mdoel for the last 4 years that made it into operational environments. We chose to use OWL/RDF, not for the inferencing immediately, but because a graph base model and the loosely-coupled nature of cyber intelligence data. The research paper Shawn Riley provided is based on the experiences building this type of solution, not once but twice (its amazing what you learn when you do it again).

I first wanted to chime in to make sure that this group doesn’t get into the same kind of discussion as is going on with “XML vs JSON”. IMHO, it isn’t “UML vs OWL/RDF” but rather a more powerful story of “UML AND OWL/RDF”. As Cory correctly stated, you can take a model in UML and convert it to OWL/RDF. For that matter, you can take OWL/RDF and represent it as UML (I built a tool to do exactly that for generating ontology documentation). I will admit, the UML->OWL/RDF converts I’ve attempted to use lack robustness, flexibility, and generally need a lot of “hand crafting” after the fact and any attempt at round-tripping is a bust IMHO. In general, I’d agree that I would rather model using UML notation but export the conceptual data model in OWL/RDF; the textual representation of OWL/RDF (RDF/XML, Turtle, Ntriples, JSON-LD) aren’t the easiest formats - It would be like writing the UML model using XMI.

I generally agree with most of the points that Cory outlined below, especially Linked Open Data and Inferencing are very different – BUT not mutually exclusive. OWL/RDF is very well suited for graph representations and can be easily “flattened”. In addition, it doesn’t dictate the physical storage model, which could be done using a relational DB, NoSQL, graph, etc. And depending upon the serialization format chosen, it can be built on top technology that permits you to have both graph and more traditional tabular form. I also agree, the graph is a solid foundation on which to build. Where I have a difference of opinion is to avoid the use of OWL.

Avoiding OWL because you can do inferencing that may today require in-memory techniques isn’t a good reason IMHO to avoid it. One can easily use OWL ‘Lite’ and not immediately leverage some of the inferencing possibilities with out any harm. OWL uses inferencing to handle a number of the relationship semantics WITHOUT having to load all of the instances into memory. A following a some good design techniques many of the pitfalls seen with early implementations of inferencing can be avoid. Because of the extensibility, it is easy to extend the OWL/RDF with additional capabilities without changing the fundamental data model in most cases. The key here isn’t to try to leverage every feature in OWL/RDF initially but to lay an architectural foundation on which it could be built later without having to start over. Additionally, OWL/RDF doesn’t require the web or networking to be utilized, nor does it limited to being utilized in the initial concept of the Semantic Web.

As Cory mentioned, the use of SPARQL instead of say SQL, enables but doesn’t require federated query (which isn’t an initial goal for this community) but lays the foundation again for its use going forward – but this is another form of access that doesn’t need to be exposed given the kind of sharing models this community is discussing. While the concepts of Linked Open Data and inferencing are very different, they are NOT mutually exclusive. Most of today’s use of Linked Open Data is focused on finding other web pages for humans that are relevant.

As I stated in the beginning, I’m in generally agreement with most of Cory’s points and IMHO I think we get a better model, with stronger semantics by combining the use of UML and OWL/RDF and yet in both cases haven’t dictated a particular implementation strategy. I believe the combination provides a solid foundation from which implementations can more easily be built.

Just my .02 cents

Paul Patrick

From: <cti-users@lists.oasis-open.org> on behalf of Cory Casanave
Date: Monday, September 28, 2015 at 2:59 PM
To: "Bush, Jonathan", Shawn Riley, "cti-users@lists.oasis-open.org"
Subject: RE: [cti-users] Re: Research Paper

In the “Semweb” community there is a difference between “linked open data” (Mostly RDF) and inferencing (mostly OWL, which can be represented in RDF). Both are graphs. OWL adds formal first order logic capabilities. The graph query language they have defined is “SPARQL” and works across both. RDF is a web based graph data model (there are others), with multiple representations including XML and JSON.

Graph data representations and queries like this work well when you want a lot of flexibility to subset and structure the data in various ways for different purposes, viewpoints and communities. If you have very fixed requirements you may be able to “flatten” it into a table or hierarchical structure. So different use cases may point you to a graph, flat or hierarchical structure for the same information. IMHO what you want is all of the above, you don’t want to get locked into any single perspective and representation. If one is foundational, the graph is the way to go.

OWL sounds great in that some of the inferences can infer new data from others. BUT, OWL inference as is implemented and used today is mostly an expensive in-memory computation. RDF assumes loosely coupled data in a very distributed model (linked open data). LOD allows reference to information on the web without the need to copy everything. So I would think a LOD graph of CTI data would be very interesting. I’m going to draw flies with this – but I would stay away from OWL (or perhaps use a very reduced subset of it, sometimes called RDF++). The reason is that OWL puts limits on your graph representation (so the OWL inferences can work) and to really take advantage of it you really need all data in memory. Of course tools could use OWL internally to process CTI data.

If you look at your underlying model as a graph, you can always flatten it for specific use cases.

To define this graph model I would still use UML, it can be mapped to OWL or RDF and is much more understandable than the text based OWL/RDF tools.

-Cory Casanave

From: cti-users@lists.oasis-open.org [mailto:cti-users@lists.oasis-open.org] On Behalf Of Bush, Jonathan
Sent: Monday, September 28, 2015 11:21 AM
To: 'Shawn Riley'; cti-users@lists.oasis-open.org
Subject: RE: [cti-users] Re: Research Paper

I do like this because it separates conceptual design from implementation, but I have a bit of a knowledge gap in the OWL area. Question – How would we get from this sort of thinking to implementation (which is of course still important)?

From:cti-users@lists.oasis-open.org [mailto:cti-users@lists.oasis-open.org] On Behalf Of Shawn Riley
Sent: Monday, September 28, 2015 9:57 AM
To: cti-users@lists.oasis-open.org
Subject: [cti-users] Re: Research Paper

Hi folks, I've had a few people follow up asking where they could learn more about knowledge engineering and semantic web technologies since that was a key focus on the science of cybersecurity research paper. I just learned of a new Massively Open Online Course (MOOC) on Knowledge Engineering and Semantic Web Technologies that is FREE and open to everyone! I posted more information in a couple blogs posts for those interested. I know there are a number of us in the STIX, CYBOX, MSM community focused on the knowledge engineering side using semantic web technologies so while this is slightly off topic, I hope you find this information of value.

https://www.linkedin.com/pulse/free-mooc-knowledge-engineering-semantic-web-2015-shawn-riley

PeerLyst

https://www.peerlyst.com/blog-post/free-mooc-knowledge-engineering-with-semantic-web-technologies-2015

On Sun, Aug 23, 2015 at 7:43 AM, Shawn Riley <shawn.p.riley@gmail.com> wrote:

Hi Folks,

Back in 2012 when STIX was released, the white paper included a mention of other possible implementations such as semantic web (OWL/RDF) and JSON-centric. We've had Bret share details on his JSON efforts with the list so I wanted to take the opportunity to share our research paper on using semantic web technologies (OWL/RDF) with the group. This isn't marketing as there is no company or product mentioned. Rather, we're just sharing knowledge based on our international research efforts. Given the ongoing debate over XML vs JSON, we thought it would also be good to show how both of these could be supported using the semantic technology approach.

In many ways our research "connected the dots" between existing cybersecurity efforts across different government agencies that have been identified by the government as the way forward. From the Science of Security championed by National Security Agency and National Science Foundation, to the Cybersecurity Measurement and Management Architecture championed by the Department of Homeland Security and the Department of Defense, and the revolutionary intelligence methodologies (object-based production / activity-based intelligence) championed by the Intelligence Community and particularly the National Geospatial-Intelligence Agency.

While each of these cybersecurity and intelligence efforts can stand on its own and provide great benefit, bringing these efforts together demonstrates how each agency's investments can see a greater return on investment by working together to develop scientific foundations for the operational cybersecurity ecosystem. Just like we need infrastructure in areas like Meteorology to understand and predict the weather we need operational cybersecurity science infrastructure to understand and predict events in our cyber ecosystem of the future.

The paper can be downloaded from LinkedIn's Slideshare.

http://www.slideshare.net/shawnriley2/cscss-science-of-security-developing-scientific-foundations-for-the-operational-cybersecurity-ecosystem

Best,

Shawn

Shawn Riley

Executive Vice President

Centre for Strategic Cyberspace + Security Science

London, England, UK - Washington, DC, USA

DTCC DISCLAIMER: This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify us immediately and delete the email and any attachments from your system. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

cti-users message