RE: [cti-stix] Proposal to establish Sightings (#306) and Relationships

>I think like with anything various parties may wish to age out sightings information as they may do with other information/intelligence at a point when they decide it is at reduced relevance but I don’t think we can throw up our hands, declare it impossible and just leave it out of the model.

>I would propose that from a language model perspective we leave the decisions of how many reported sightings to keep up to the users.

> If a user’s capabilities cannot keep up with massive volumes of sightings, they will filter them. If they are getting massive volumes of basically identical sightings reports from the same organization, they are very likely to capture metadata and filter the individual sightings. There are different ways to deal with the volume. I believe we should leave those questions to the users not the model. The model should support the capabilities needed for appropriate analysis and sharing.

I completely agree with this sentiment. It should most definitively be the implementation that sets the particular limits for how long the individual sightings are kept.

I do agree that the model should support the capabilities needed for appropriate analysis and sharing, but at the same it does need the ability to cope with the realities of process large volumes of data.

It is the responsibility of the data model to enable to capture of that raw detailed sightings data, and it is the responsiblity of the implementations to determine what is the best way to age that data out to ensure the implementation runs smoothly. In the same way that a Snort IDS needs tuning to reduce the amount of signatures it looks for or bigger hardware, a TAXII/STIX system will need tuning or resourcing to cope with the amount of data being thrown at it.

Have a small box to use as a TAXII server? Guess what – you’re only going to be able to have a few Threat Sharing sources enabled. Want more threat intel sources? We’ll you’re now going to have a shorter lifetime and a faster age out process.

This sort data is absolutely necessary to enable the deeper level analysis and investigations required for us to REALLY start tracking bad guys. At present we are all focused on indicators and sightings because that is the most active right now.

I want a data model that lets me do:

- Clustering analysis

- Statistical analysis

- Machine learning

I want insights into what the underlying goals are of threat actors that target my organization. I want to know why. I want to know if it’s because of what I do? Or who I do it for?

I want to do Threat Intelligence, not Threat Data.

Cheers

Terry MacDonald

Senior STIX Subject Matter Expert

SOLTRA | An FS-ISAC and DTCC Company

+61 (407) 203 206 | terry@soltra.com

If a user’s capabilities cannot keep up with massive volumes of sightings, they will filter them. If they are getting massive volumes of basically identical sightings reports from the same organization, they are very likely to capture metadata and filter the individual sightings. There are different ways to deal with the volume. I believe we should leave those questions to the users not the model. The model should support the capabilities needed for appropriate analysis and sharing.

sean

From: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org> on behalf of Jason Keirstead <Jason.Keirstead@ca.ibm.com>
Date: Tuesday, November 3, 2015 at 11:10 AM
To: John Wunder <jwunder@mitre.org>
Cc: Jerome Athias <athiasjerome@gmail.com>, "Jordan, Bret" <bret.jordan@bluecoat.com>, "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, "Taylor, Marlon" <Marlon.Taylor@hq.dhs.gov>, Mark Davidson <mdavidson@mitre.org>, "Barnum, Sean D." <sbarnum@mitre.org>, Terry MacDonald <terry@soltra.com>
Subject: Re: [cti-stix] Proposal to establish Sightings (#306) and Relationships (#291) as our official issue topics under active consideration for STIX v2.0

In general, if you design anything with the requirement to scale large, then said system can easily scale down. But the inverse is rarely true.

Lets take this back to root principles.. the debate seems to be around what a sighting is:

a) Is it an edge between an indicator and an observer; or

b) Is it a vertex itself, with an edge each between indicator and to observer

-
Jason Keirstead
Product Architect, Security Intelligence, IBM Security Systems
www.ibm.com/security | www.securityintelligence.com

Without data, all you are is just another person with an opinion - Unknown

"Wunder, John A." ---2015/11/03 12:01:46 PM---Is that true in all scenarios? Sure, a lot of commodity indicators will probably have zillions of hi

From: "Wunder, John A." <jwunder@mitre.org>
To: Jason Keirstead/CanEast/IBM@IBMCA
Cc: "Barnum, Sean D." <sbarnum@mitre.org>, Jerome Athias <athiasjerome@gmail.com>, "Jordan, Bret" <bret.jordan@bluecoat.com>, "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, "Taylor, Marlon" <Marlon.Taylor@hq.dhs.gov>, "Davidson II, Mark S" <mdavidson@mitre.org>, Terry MacDonald <terry@soltra.com>
Date: 2015/11/03 12:01 PM
Subject: Re: [cti-stix] Proposal to establish Sightings (#306) and Relationships (#291) as our official issue topics under active consideration for STIX v2.0

Is that true in all scenarios? Sure, a lot of commodity indicators will probably have zillions of hits. But what about targeted indicators of APT activity that we want to carefully track?

I feel like we’re designing for this one scenario of a ton of sightings when in practice the more valuable activity might be less volume and more specificity. (Not to say we don’t care about the volume use case, just that it’s not the only one).

John

On Nov 3, 2015, at 10:49 AM, Jason Keirstead <Jason.Keirstead@CA.IBM.COM> wrote:

I understand the theoretical usefulness, but I still stand by the fact that once you get into large scale, it's usefulness as raw data becomes inconsequential... In terms of the graph - I believe sightings is a metric on the edge between an observer and an indicator, and that edge has attributes such as "count" and "last seen". It is not a vertex in and of itself, that would not scale in real world scenarios.You also don't need to store the raw instances of sightings to do the most useful analysis of those metrics (including temporal). I can have a time series database tied to the edge that is storing sighting counts over time, without storing the actual raw sighting instances.

> Think of sightings like case reporting from doctors to the CDC.

The problem is we have to deal with much larger scale than this, and always keep the demands of that scale in mind.

If we only had to deal with ~ 10 billion possible sighting records for an indicator then I would be a happy camper, but that is far from the case.

-
Jason Keirstead
Product Architect, Security Intelligence, IBM Security Systems
www.ibm.com/security | www.securityintelligence.com

Without data, all you are is just another person with an opinion - Unknown

<graycol.gif>"Barnum, Sean D." ---2015/11/03 11:38:59 AM--->Why would I want unique records of all of those sightings, to what purpose is it serving? What peop

From: "Barnum, Sean D." <sbarnum@mitre.org>
To: Jason Keirstead/CanEast/IBM@IBMCA, Terry MacDonald <terry@soltra.com>
Cc: Jerome Athias <athiasjerome@gmail.com>, "Jordan, Bret" <bret.jordan@bluecoat.com>, "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, "Wunder, John A." <jwunder@mitre.org>, "Taylor, Marlon" <Marlon.Taylor@hq.dhs.gov>, "Davidson II, Mark S" <mdavidson@mitre.org>
Date: 2015/11/03 11:38 AM
Subject: Re: [cti-stix] Proposal to establish Sightings (#306) and Relationships (#291) as our official issue topics under active consideration for STIX v2.0

>Why would I want unique records of all of those sightings, to what purpose is it serving? What people care about in a sighting is a count of indicators, so that they can give increased >significance to those that are currently "live”.

The count is like a heartbeat. It tells you if that TTP is still “alive” but that is really all it does.
It is the actual sighting details that give you deeper insight (“intelligence”) into what is happening and how you might prevent or mitigate it.
There are many forms of analysis that can be done on the sighting information but the most obvious and prevalent have to do with “when” and “who”.
Temporal analysis across the actual sightings can yield all sorts of insight beyond just “alive” or “dead”.
Similarly, analysis of who is sighting the indicator and when can give very valuable insight into victim targeting, who is being affected that might not know it yet and who will likely be affected next.
If the sightings include details of what was actually observed rather than just a “matched pattern” count this information can also be very valuable in understanding the nature of the TTP and how variations of it may be being applied to different subsets of the victim targeting pool.

Think of sightings like case reporting from doctors to the CDC. If you want to know if a potential contagion is something to worry about then counts give you the first measure but if you want to actually study the epidemiology, know how fast and how far it is spreading, know where it will likely spread next, know what sort of victims are most susceptible, know which methods are successful in slowing or stopping it and want to get ahead of it, you will need the actual “sightings”.

sean

From: Jason Keirstead <Jason.Keirstead@ca.ibm.com>
Date: Tuesday, November 3, 2015 at 8:48 AM
To: Terry MacDonald <terry@soltra.com>
Cc: Jerome Athias <athiasjerome@gmail.com>, "Jordan, Bret" <bret.jordan@bluecoat.com>, "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, John Wunder <jwunder@mitre.org>, "Taylor, Marlon" <Marlon.Taylor@hq.dhs.gov>, Mark Davidson <mdavidson@mitre.org>, "Barnum, Sean D." <sbarnum@mitre.org>
Subject: RE: [cti-stix] Proposal to establish Sightings (#306) and Relationships (#291) as our official issue topics under active consideration for STIX v2.0

> Um, why do we want the same ID? If the attacker has sent our Org the same pdf 2000 times then don’t we want to record that fact, and link the Sighting objects (with Observable Instances) to the Incident object with 2000 relationship objects? Then > don’t we want to also send that group of Incident, Sightings, Observables and Relationships to others in our Threat Sharing group so that they are aware of them? That is accurate.

I am going to humbly suggest that there is no way any large organization has the resources to do what you are suggesting (record unique sighting objects in the graph for every occurrence of an indicator). I could easily have tens of thousands or more sightings of a single indicator in 1 day alone in real-world situations... extrapolate that out to millions of indicators monitored and months of data and you will see how this would impact your graph. Why would I want unique records of all of those sightings, to what purpose is it serving? What people care about in a sighting is a count of indicators, so that they can give increased significance to those that are currently "live". IE - the way I see things happening in most all implementations is the sighting will be ingested, it will be used to increment some counts, and it will then be discarded. I can't possibly see any implementation storing raw sightings, at least not at an enterprise scale, unless it has some arbitrary cap like "store raw sightings for 24 hours and then discard"

-
Jason Keirstead
Product Architect, Security Intelligence, IBM Security Systems
www.ibm.com/security | www.securityintelligence.com

Without data, all you are is just another person with an opinion - Unknown

<graycol.gif>Terry MacDonald ---2015/10/30 07:32:02 PM---Um, why do we want the same ID? If the attacker has sent our Org the same pdf 2000 times then don’t

From: Terry MacDonald <terry@soltra.com>
To: "Jordan, Bret" <bret.jordan@bluecoat.com>, Jason Keirstead/CanEast/IBM@IBMCA
Cc: "Wunder, John A." <jwunder@mitre.org>, Mark Davidson <mdavidson@mitre.org>, "Sean D. Barnum" <sbarnum@mitre.org>, Jerome Athias <athiasjerome@gmail.com>, "Taylor, Marlon" <Marlon.Taylor@hq.dhs.gov>, "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>
Date: 2015/10/30 07:32 PM
Subject: RE: [cti-stix] Proposal to establish Sightings (#306) and Relationships (#291) as our official issue topics under active consideration for STIX v2.0

Um, why do we want the same ID? If the attacker has sent our Org the same pdf 2000 times then don’t we want to record that fact, and link the Sighting objects (with Observable Instances) to the Incident object with 2000 relationship objects? Then don’t we want to also send that group of Incident, Sightings, Observables and Relationships to others in our Threat Sharing group so that they are aware of them? That is accurate.

If we are worried about the size of storing the PDF multiple times, then it is up to the implementation to recognize that the MD5 of the attachment item is the same and then actually only store it once (just like MS Exchange servers have been doing since mid-2000’s).

How do we identify to others that the above data came from us?

If the ID of the object just generated from the <HashofContent> then there is no easy way to do this. If the ID of the object generated from the namespace.<HashofContent> then we have more chance.

But what happens if we decide to update the Incident? The ID is now namespace.<NewHashofContent>. Now how do we de-duplicate? Do we now have to puclish a relationship object explicitly stating the is a replacement object for the namespace.<HashofContent> object?

And what about relationship objects? Part of the power of separate top-level objects is that we can now just tell people about the relationship, but we can keep the actual data node it refers to a secret. Therefore in some implementations the only link to tie relationships together is the fact both relationships share an ID:

e.g
RelationshipA (src: CampaignA -> Threat ActorA)
RelationshipB (src: IndicatorA -> CampaignA)
RelationshipC (src: IndicatorB -> CampaignA)

The recipient may not have the CampaignA data or ThreatActorA data, but they will still know that the IndicatorA and IndicatorB are related to the same campaign thanks to the relationship contains in the same IDs. This completely breaks if the ID’s change over time.

We need an ID solution that:

- Includes the domain namespace in the ID so that recipients know where to ask for more information.
- The ID stays the same over the lifetime of the object even if it is updated and the content changes.
- Recognizes that IDs will be coming from many different companies and many different sources and that we ned a way of easily understanding who produced the data.

To go over the FW use case again

1. FW 1 see a series of weaponized PDFs come down through email. For each weaponized PDF email it detects, it creates a detection alert and sends that to its FW mgmt server.
2. The FW mgmt. server has STIX/TAXII capabilities. For the first detection alert that the FW MGMT receives, it creates a STIX v2 Sighting object, and a corresponding STIX Observable containing a CybOX EmailMessage Object and a related File object, and two relationship objects to join the STIX Sighting to the Observables. It stores a mapping of the Observable SHA256 / file ID in a local internal data table for the EmailMessage and the File. It sends these out on the TAXII channel that it was configured to use.
3. The main TAXII repository receives this STIX v2 Sighting object and the corresponding STIX Observable containing a CybOX an EmailMessage Object and a related File object, and adds them to its repository.
4. For the second detection alert that the FW MGMT receives, it does a SHA256 hash of the Email contents and the attached File independently to see if it’s seen them before. It hasn’t seen the EmailMessage before, but it has seen the attached PDF.
5. it creates a STIX v2 Sighting object, and a corresponding STIX Observable containing a new CybOX EmailMessage Object (email address was different). The EmailMessage contains the idref of the previously generated File object. It also adds two relationship objects to join the new STIX Sighting to the Observables. It sends these out on the TAXII channel that it was configured to use.
6. The main TAXII repository receives this second detection STIX v2 content, and adds them to its repository.
7. The next detection alerts each will create a new Sighting object, new EmailMessage object but will refer to the same File object. Relationships will be created between these objects as well.

At this point, the main taxi repo knows that the File objects are all related.

Cheers

Terry MacDonald
Senior STIX Subject Matter Expert
SOLTRA | An FS-ISAC and DTCC Company
+61 (407) 203 206 | terry@soltra.com

From: Jordan, Bret [mailto:bret.jordan@bluecoat.com]
Sent: Saturday, 31 October 2015 4:54 AM
To: Jason Keirstead <Jason.Keirstead@ca.ibm.com>
Cc: Wunder, John A. <jwunder@mitre.org>; Terry MacDonald <terry@soltra.com>; Mark Davidson <mdavidson@mitre.org>; Sean D. Barnum <sbarnum@mitre.org>; Jerome Athias <athiasjerome@gmail.com>; Taylor, Marlon <Marlon.Taylor@hq.dhs.gov>; cti-stix@lists.oasis-open.org
Subject: Re: [cti-stix] Proposal to establish Sightings (#306) and Relationships (#291) as our official issue topics under active consideration for STIX v2.0

Lets run the FW use case to the ground, since most everyone should understand it...

FW 1 see a series of weaponized PDFs come down. Say it sees the same Weaponized PDF 2,000 times over a period of 3 days. A large phishing attack with a lot of click happy users.

1) Now it is highly unlikely with the current model that the FW will remember and use the same ID value (UUID) for each Indicator+Observable+MAEC data blob it issues for this Weaponized PDF. In fact, it will probably have 2,000 different UUID IDs for the same Indicator.

2) Now when you compound this by 60,000 client in the network issuing Sightings, this becomes to blow up quickly.

Maybe... Just maybe.... The FW could take the JSON Indicator that it is going to issue and hash the data blob and use that hash as the ID. Then at least each FW that is running the same code and is seeing basically the same thing with the same amount of data-enrichment, will issue the same ID value.

We will have a totally different problem in TAXII Land in the Query REST API. Because you will probably want to do something like:

/t2/query/indicator/file_name/FreeFood.pdf
or
/t2/query/indicator/file_hash/<some file hash of the PDF>

Thanks,

Bret

Bret Jordan CISSP
Director of Security Architecture and Standards | Office of the CTO
Blue Coat Systems
PGP Fingerprint: 63B4 FC53 680A 6B7D 1447 F2C0 74F8 ACAE 7415 0050
"Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg."

On Oct 30, 2015, at 11:42, Jason Keirstead <Jason.Keirstead@ca.ibm.com> wrote:

So then the question becomes - if the consumers are not using the IDs, then why are they required...

"That said, that doesn’t mean sightings shouldn’t have IDs. If I autogen IDs for sightings then you can delete them, revoke them, ask for more info about them, etc. Maybe not everyone will do or support that (a firewall generating millions of sightings won’t persist the ID, but the threat intel tool working human-to-human sightings might) but by having an ID we can at least support it."

I am against a mandatory 32 or 64 or whatever bytes in every sighting message if usually the bytes don't have any meaning behind them.

And to again re-iterate - this problem is beyond sightings... it certainly exists for many classes of observables, and sometimes even indicators.

-
Jason Keirstead
Product Architect, Security Intelligence, IBM Security Systems
www.ibm.com/security | www.securityintelligence.com

Without data, all you are is just another person with an opinion - Unknown

[attachment "graycol.gif" deleted by Jason Keirstead/CanEast/IBM]

cti-stix message