Re: [EXT] [cti-stix] IDs with uuidv5 too

Terry,

How is this going to be easier on the client or threat provider? The ID method you are proposing, IMHO, does not help the and here is why. Lets walk through this example in reverse. You get a STIX object, and you want to know which ID in your internal system it maps back to... This is very common thing that I hear all the time. But, UUIDv5 is not reversible. So, that means you need to keep a lookup table of the computed values of your "seed" + internal ID = new UUIDv5 ID anyway. There will be no way around it. Threat vendors have two real choices. Use a custom property to store their internal ID, use the external references to store their internal ID, or keep a database table of mappings. And I would argue that since you might be worried about CTI proxies stripping off some of your content and republishing it, you will probably want and need to keep a record of every ID you issue.

I have these discussions all the time with my internal teams. We as a company would love to use deterministic IDs, as we want to do the ID generation on the end-point. But after you walk through all of the failing points, you begin to see that UUIDv5 does not really work out so well.

Bret

Hi Bret,

That's incorrect that the STIX IDs will collide. Each vendor would have their own 'starter' uuid for each object type. UUIDv5 would then mash the unique starter uuid with the internal primary key ID, to create a new UUID that is unique to that vendor. In other words, it is repeatably creatable, but only for that vendor.

Our past discussion on UUID's involved passing the entire object, effectively making the entire object immutable. That is not the same as this. They are different use cases.

Here we are effectively programmatically allowing the producer to tie their internal Id to their own uuid, but do it in a way that greatly simplifies implementation. There is no downsides that I can see for allowing this method of uuid generation - as long as each vendor is its own unique starter uuid, and as long as only the database records primary identifier is used.

I cannot state this more clearly... This is a real problem that threat vendors are experiencing now, and we need to find an official solution. I've had multiple vendors tell me they have had the same problem since posting this message.

We cannot just dismiss this issue Bret. You've talked before about removing impediments to adoption. Why is this real issue affecting vendors right now different?

Cheers

Terry MacDonald

Cosive

On 22/05/2017 3:43 AM, "Bret Jordan" <Bret_Jordan@symantec.com> wrote:

Terry,

The problem with this is the IDs are then no longer guaranteed to be unique. If Vendor X and Vendor Y have the same numbering system, or one that has some level of overlap, then the IDs created in STIX will collide. The whole reason we went with UUIDv4 is to make sure we did not have collisions. Using just the internal ID is not enough data to seed the deterministic generator.

We discussed all of this, and this use case, in the past. Deterministic IDs do not work for us. Not without a massive amount of caveats and seed data.

We really need to build a FAQ about IDs and timestamps and precision so that when these questions come up again, we can point people to them.

Bret

From: cti-stix@lists.oasis-open.org <cti-stix@lists.oasis-open.org> on behalf of Terry MacDonald <terry.macdonald@cosive.com>

Sent: Friday, May 19, 2017 3:23:42 PM
To: cti-stix@lists.oasis-open.org; CTI TC Discussion List
Subject: [EXT] [cti-stix] IDs with uuidv5 too

Hi all,

As part of recent integration work we've been doing with a Threat Intel vendor, it's become apparent that the current restriction in uuidv4 generation of STIX IDs causes extra work for vendors. I firmly believe that we need to add the ability to generate uuidv5 based STIX IDs as well in order to make implementation simpler.

The problem:

Most Threat Intel vendors at present publish their threat intelligence through their own web based platform. Most vendors also support a JSON based REST API that they allow their authenticated users to access the same Threat Intel through. Many vendors are now adding STIX and TAXII support alongside their previous JSON API, and it's this Co-existence that's causing the issue.

All Threat Intel vendors I've spoken to use a proprietary identifier as their primary key. Any relationships that link their internal Intel together is done using that proprietary primary key.

This causes massive problems for the vendors when it comes time to translate their data into STIX. They need to somehow maintain a relationship between their internal primary id and the STIX id that is mandated by the standards.

This effectively forces the vendor to either:

maintain an ID translation table that records all STIX IDs sent out and the corresponding internal primary key it relates to, or

change their internal intelligence database to include STIX IDs generated when the data is first added.

The first suggestion becomes unwieldy very quickly, as the amount of data to track would grow huge very speedily. The second suggestion is the right longer term solution, a bit requires changing the vendors main data source which they are often very reluctant to do. Most Threat Intel vendors are just experimenting with STIX right now.

The solution:

We've found the quick and easy solution is to tell the that Intel vendor to generate uuidv5 STIX IDs based solely on the primary key. UUIDv5 takes a 'seed' uuid, and combines it with a known value (the primary key) and results in a resulting uuid that is derived from the primary key.

This makes the vendors job much easier, as they can now easily bolt-on STIX IDs during the STIXification of their Threat Intel data, and they don't need to make any big changes to their main Intel systems. This makes it much more likely that the vendor will look at supporting STIX and TAXII, as it's not so impactful on their current operations.

Recommendation:

We recommend that UUIDv5 is added to the six 2.1 standard alongside UUIDv4 as an acceptable way to generate STIX IDs. Vendors who choose to use UUIDv5 method of uuid creation must use their internal threat intelligence proprietary identifier (primary key) as the value that is passed into the UUIDv5 generation process. We also recommend that the seed uuid value is different for each STIX object type to minimize the chance of uuid collisions.

import uuid

# randomly selected starter UUID for Indicators

INDICATOR_NAMESPACE_SEED = uuid.UUID('a288ef91-8db3-46de-22ae-8c13fe286599')

# Vendor's internal ID for the record we are turning into a STIX Indicator

VENDOR_INTEL_INTERNAL_ID = "id-1.1.1.1"

# create Indicator STIX ID from the vendors internal ID using UUIDv5

indicator_stix_id = "indicator--" + str(uuid.uuid5(INDICATOR_NAMESPACE_SEED, VENDOR_INTEL_INTERNAL_ID))

# print the Indicator STIX ID

print (indicator_stix_id)

# The STIX ID for record 'id-1.1.1.1' will always be

# indicator--f5704f38-a888-58ed-9fc3-f6659c658847

# using the UUIDv5 method

By restricting the UUIDv5 so it's based on the primary key we will make it far easier for vendors to generate STIX from data they already have. This in turn will make it far more likely that they will.

Could this be discussed and agreed at the Face-2-face?

Cheers

Terry MacDonald

Cosive

cti-stix message