RE: [cti-users] Vote NO on JSON - Vote YES on JSON-LD and here is why...

With all due respect, I’ll wait until I can make an informed decision on the pluses and minuses of the JSON-LD approach once it’s been presented.

Any idea when that will be? Is it within the next two weeks or so? It will help me understand when we can finish this discussion, decide one way or the other, and move on

Cheers

Terry MacDonald

Senior STIX Subject Matter Expert

SOLTRA | An FS-ISAC and DTCC Company

+61 (407) 203 206 | terry@soltra.com

From: Shawn Riley [mailto:shawn.p.riley@gmail.com]
Sent: Thursday, 26 November 2015 6:51 AM
To: Terry MacDonald <terry@soltra.com>
Cc: cti-stix@lists.oasis-open.org; cti-users@lists.oasis-open.org; Jonathan Bush (DTCC) <jbush@dtcc.com>
Subject: RE: [cti-users] Vote NO on JSON - Vote YES on JSON-LD and here is why...

With all do respect, OBP can not be done with JSON and JSONSchema without the semantic Ontologies. If it can, please provide concrete examples and demonstrate this to the community so we all can understand how you can do what Google, Facebook, DoD, etc couldn't with traditional approaches such as JSON.

On Nov 25, 2015 2:45 PM, "Terry MacDonald" <terry@soltra.com> wrote:

“We've been trying to apply this to cyber threat intelligence using STIX / CYBOX and it was the plan for this to happen with STIX 2.0 until the whole JSON direction sidetracked it to keep it on the legacy path instead of the new OBP path enabled by semantic ontologies and serializations.”

I take umbrage with this point. There is no guarantee that developing the model using OWL/RDF will result in better objects being created or something more attuned to the ‘OBP path’. The same OBP ideology can be applied at the JSON level, and it was the direction we were heading. The new top-level relationship object that has been discussed is a step in the right direction.

I do hope that the JSON-LD presentation is a good clear practical and pragmatic one, as I do believe the benefits of deriving serialization from a model are useful as long as the serialization output is something that people are actually going to use.

Any idea on timeframe for when the JSON-LD-based approach will be presentable?

Cheers

Terry MacDonald

Senior STIX Subject Matter Expert

SOLTRA | An FS-ISAC and DTCC Company

+61 (407) 203 206 | terry@soltra.com

From: cti-users@lists.oasis-open.org [mailto:cti-users@lists.oasis-open.org] On Behalf Of Shawn Riley
Sent: Thursday, 26 November 2015 1:38 AM
To: Jonathan Bush (DTCC) <jbush@dtcc.com>
Cc: cti-users@lists.oasis-open.org
Subject: Re: [cti-users] Vote NO on JSON - Vote YES on JSON-LD and here is why...

Hi Jonathan,

I'm not sure if you read another article I shared in the LinkedIn post at the start of this thread or in a follow up knowledge sharing article on OBP & ABI but it's worth reading since it's vendors discussing OBP. I thought the CTI vendors might relate better to hearing other vendors discuss OBP rather than hearing it from an old CTI analyst like myself.

Organizing the Knowns - http://www.kmimediagroup.com/gif/424-articles-gif/organizing-the-knowns/6361-organizing-the-knowns

I wrote another article on LinkedIn earlier this year to share knowledge about OBP & ABI as applied to cyber. Below is a section that I believe answers some of your questions. I'll put a link to the full article at the end of the couple paragraphs.

Object-Based Production of Knowledge

In terms of Semantic eScience of Security, one might think of ontologies in OWL/RDF as acting as an Object Description Framework (ODF) that enables Object-Based Production of knowledge from each of the common language used in the Cybersecurity Measurement and Management Architecture. These objects are then mapped to the ontology that defines the conceptual semantic object model. The individual semantic object models for each data set (STIX, MAEC, CVE, etc.) are interconnected by a unifying object model.

Object-Based Production (OBP) works by representing these various pieces of data as ‘objects’ in order to gain greater insights about the nature of the object, the object’s attributes, the relationships or associations amongst objects, and observed activity. Through the modeling of data as objects, attributes, associations, and activities it becomes dramatically easier to understand and categorize objects, often through just an examination of its attributes. It also helps in the identification of behaviors normally attributed to an object. In short, it represents and organizes the data in a manner similar to the way humans think about objects in the natural world. This then enables a focus on the creation and organization of knowledge about what is known so organizations can to do a better job of discovering the unknown through a methodology known as Activity-Based Intelligence (ABI).

In order to get a complete description of an object, it may require multiple statements to be made where each statement describes a specific attribute or association of the object. This collection of statements provides a description of an object and can be easily added to by just adding additional statements as new knowledge about the object is discovered.
Source: https://www.linkedin.com/pulse/object-based-production-activity-based-intelligence-shawn-riley

It's also worth pointing out that OBP is exactly what Google has done to create the Google Knowledge Graph and what Facebook has done to create their Social Graph. It all requires semantic ontologies and the serialization formats to encode that semantic information when knowledge is shared. That is why Google has incorporated JSON-LD into its baseline and it is used when connecting things to the Google Knowledge Graph.

We've been trying to apply this to cyber threat intelligence using STIX / CYBOX and it was the plan for this to happen with STIX 2.0 until the whole JSON direction sidetracked it to keep it on the legacy path instead of the new OBP path enabled by semantic ontologies and serializations.

I don't want to overload anyone with information so have a look at the information I provided and please feel free to reach out with more questions.

Best,
Shawn

On Wed, Nov 25, 2015 at 9:19 AM, Bush, Jonathan <jbush@dtcc.com> wrote:

Seems like a powerful concept Shawn.

This might be obvious, but is your hypothesis that JSON-LD is the enabling technology to make OBP happen? Is that the ONLY way to implement OBP?

Also, I see that the concept started in the DoD. How has the implementation of OBP gone there? Has it been attempted (from an implementation perspective) outside of the DoD, in commercial land anywhere?

Sorry, probably too many questions at once…

From: cti-users@lists.oasis-open.org [mailto:cti-users@lists.oasis-open.org] On Behalf Of Shawn Riley
Sent: Wednesday, November 25, 2015 6:51 AM
To: cti-users@lists.oasis-open.org

Subject: Re: [cti-users] Vote NO on JSON - Vote YES on JSON-LD and here is why...

Hi Folks,

There was another really good article discussing Object-Based Production (OBP) in the news. As some of you might be aware, I've been focused on applying Object-Based Production to cyber security / cyber threat intelligence since some of discussed this methodology at BlackHat 2010 in Vegas. It's the heart of what the Semantic eScience of Security paper was about and how it support the science of security core themes while modernizing analytic tradecraft. Here is a snippit of information about OBP and link to the article.

Shawn

Object-based production is a concept being implemented as a whole-of-community initiative that fundamentally changes the way the IC organizes information and intelligence. Reduced to its simplest terms, OBP creates a conceptual “object” for people, places, and things and then uses that object as a “bucket” to store all information and intelligence produced about those people, places, and things. The object becomes the single point of convergence for all information and intelligence produced about a topic of interest to intelligence professionals. By extension, the objects also become the launching point to discover information and intelligence. Hence, OBP is not a tool or a technology, but a deliberate way of doing business.

While simple, OBP constitutes a revolutionary change in how the IC and the Department of Defense (DOD) organize information, particularly as it relates to discovery and analysis of information and intelligence. Historically, the IC and DOD organized and disseminated information and intelligence based on the organization that produced it. So retrieving all available information about a person, place, or thing was primarily performed by going to the individual repository of each data producer and/or understanding the sometimes unique naming conventions used by the different data producers to retrieve that organization’s information or intelligence about the same person, place, or thing. Consequently, analysts could conceivably omit or miss important information or erroneously assume gaps existed.

OBP aims to remedy this problem and increase information integration across the IC and DOD by creating a common landing zone for data that cross organizational and functional boundaries. Furthermore, this business model introduces analytic efficiency; it reduces the amount of time analysts spend organizing, structuring, and discovering information and intelligence across the enterprise. By extension, OBP can afford analysts more time for higher orders of analysis while reducing how long it takes to understand how new data relate to existing knowledge. A central premise of OBP is that when information is organized, its usefulness increases.

A concrete example best illustrates the organizing principle of OBP and how it would apply to the IC and DOD. Consider a professional baseball team and how OBP would create objects and organize information for all known people, places, and things associated with the team. At a minimum, “person” objects would be created for each individual directly associated with the team, including coaches, players, the general manager, executives, and so forth. As an example of person-object data, these objects would include characteristics such as a picture, height, weight, sex, position played, college attended, and so forth. The purpose is to create, whenever possible, objects distinguishable from other objects. This list of person-objects can be enduring over time and include current and/or past people objects or family or previous team relationships.

In a similar fashion, objects could be created for the physical locations associated with the team, including the stadium, training facility, parking lots, and players’ homes. The same could be done for “thing” objects associated with the team, such as baseballs, bats, uniforms, training equipment, team cars/buses/planes, and so forth.

With the baseball team’s objects established, producers could report information to the objects (for example, games, statistics, news for players, or stadium upgrades), which would serve as a centralized location to learn about activity or information related to the team. Also, relationships could be established between the objects to create groupings of objects that represent issues or topics. For example, a grouping of people-objects could be created to stand for the infield or outfield, coaching staff, or team executives. Tangential topics/issues such as “professional baseball players involved in charity” could be established as well. Events or activities (such as games) and the objects associated with them could also be described in this object-centric data construct. Moreover, the concept could expand to cover all teams in a professional baseball league or other professional sports or abstract concepts that include people, places, or things.

Similar to the example above, the IC and DOD will create objects for the people, places, things, and concepts that are the focus of intelligence and military operations. Topics could include South China Sea territorial disputes, transnational criminal organizations, Afghan elections, and illicit trade. Much like the sports example, IC and DOD issues have associated people, places, and concepts that could be objects for knowledge management.

Read the whole article here: https://www.govtechworks.com/transforming-defense-analysis/#gs.MnGchY0

On Tue, Nov 24, 2015 at 3:49 PM, Terry MacDonald <terry@soltra.com> wrote:

" Doing it upside down will not, IMHO, lead to a usable result or widespread adoption."

This comment is where I have a slight problem. The upside down development process may not be perfect, but it has worked 'well enough' up to this point. OASIS CTI is the largest standards group that OASIS has had so far as I understand it, so STIX/TAXII/CybOX must be fairly useful and have reached a reasonable adoption even in its current bohemian state to have generated such interest.

STIX itself is IMHO empirical evidence that sometimes good enough is good enough.

That said, if there is a way that we can improve the model and the way we derive serializations without impacting implementers onerously, then I am very keen to see it.

Cheers

Terry MacDonald

Senior STIX Subject Matter Expert

SOLTRA | An FS-ISAC and DTCC Company

+61 (407) 203 206 | terry@soltra.com

-----Original Message-----
From: cti-users@lists.oasis-open.org [mailto:cti-users@lists.oasis-open.org] On Behalf Of Cory Casanave
Sent: Wednesday, 25 November 2015 2:03 AM
To: Kirillov, Ivan A. <ikirillov@mitre.org>; Trey Darley <trey@soltra.com>; Shawn Riley <shawn.p.riley@gmail.com>
Cc: cti-users@lists.oasis-open.org
Subject: RE: [cti-users] Vote NO on JSON - Vote YES on JSON-LD and here is why...

Ivan,

This is a discussion we should have. I am not opposed to well-formed OWL either, what is important is that we have a semantic description. What we have found in threat/risk is that in conceptual UML models we have 90% of the expressiveness of OWL in addition to being able to assert some things OWL is very bad at, such as the time, context and provenance of statements. Keep in mind we are using UML based on a specific profile for this purpose.

What concerns me more is statement like we should refactor first and then look at the models. Valid refactoring at a syntax level has gone wrong every time I have seen it as what your syntax means gets confused and inconsistent. This becomes a barrier for implementation and interoperability. Whatever the language of _expression_, the model should be where the concepts and their relationships are figured out - we can then come up with more or more syntax representations for it. Doing it upside down will not, IMHO, lead to a usable result or widespread adoption.

-Cory

-----Original Message-----

From: Kirillov, Ivan A. [mailto:ikirillov@mitre.org]

Sent: Tuesday, November 24, 2015 8:21 AM

To: Cory Casanave; Trey Darley; Shawn Riley

Cc: cti-users@lists.oasis-open.org

Subject: Re: [cti-users] Vote NO on JSON - Vote YES on JSON-LD and here is why...

That doesn’t answer my question. You’re still not getting a true ontology - just various auto-generated schemas based on UML, which I have yet to see be proven as useful. My inclination that we really need to rebuild STIX/CybOX from the ground up in RDF/OWL, including on making sure that we have the right set of instances, datatype properties, object properties, etc. if we JSON-LD or another ontology-based exchange to be useful. Otherwise, I feel that JSON schema offers the best value in the interim and will help driven adoption. Again, we can always revisit the JSON-LD question when we are ready.

Regards,

Ivan

On 11/23/15, 4:32 PM, "Cory Casanave" <cory-c@modeldriven.com> wrote:

>Re: Given that, what is the value of JSON-LD in a UML-driven, XSD-derived representation?

>

>JSON-LD, JSON-Schema, RDF Schema and XML Schema can all be produced, in a consistent form, from a well-structured UML model.

>

>-Cory

>

>-----Original Message-----

>From: cti-users@lists.oasis-open.org [mailto:cti-users@lists.oasis-open.org] On Behalf Of Kirillov, Ivan A.

>Sent: Monday, November 23, 2015 2:50 PM

>To: Trey Darley; Shawn Riley

>Cc: cti-users@lists.oasis-open.org

>Subject: Re: [cti-users] Vote NO on JSON - Vote YES on JSON-LD and here is why...

>

>To add to Trey’s point below, JSON-LD would be a much more logical choice if STIX and CybOX had native ontological (RDF/OWL) representations. While this is likely a direction we’re heading in, it’s not where we are at today. Given that, what is the value of JSON-LD in a UML-driven, XSD-derived representation?

>

>Regards,

>Ivan

>

>

>

>

>On 11/23/15, 4:06 AM, "Trey Darley" <cti-users@lists.oasis-open.org on behalf of trey@soltra.com> wrote:

>

>>*Nor* is it the case that we are ruling out standardizing a JSON-LD

>>CTI serialization schema *in future*. From the mail that went out

>>Friday:

>>

>><snip>

>>Likewise, the co-chairs recognize that there will be communities of

>>interest requiring alternative serialization formats (XML, protobufs,

>>JSON-LD, OWL, etc). The OASIS TC has a role to play in helping to

>>standardize these alternative representations to ensure

>>interoperabilitity. However, that work effort lies in the future.

>>First we must complete the task at hand.

>></snip>

DTCC DISCLAIMER: This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify us immediately and delete the email and any attachments from your system. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

cti-stix message

Object-Based Production of Knowledge