OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

cti-taxii message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: More questions about tracking objects in collections


All,


I have brought up a tangential question a bit ago, but I am still struggling with how to best address this.  I think the conclusion that we came do a few weeks back might in fact be wrong.....   I will do my best to try an explain the problem, at least the way I see it.


1) A TAXII Server will probably want to have a repository / database of STIX content 

2) A TAXII Collection will have some subset of the total objects in its collection

3) A STIX object can be in multiple collections 

4) An object in a collection probably means all versions of that object. Meaning, you would probably not want to individually track all versions of an options in each collection. Which also means, if you update an object you probably want the update to be available in every collection that it is found in.  You can see the problem at scale, take 1 billion objects, 1000 collections with 20% overlap of data.  


Now lets look at the following small data set (yes the ID is not a valid UUIDv4 and the ver is not a timestamp, but this is done to illustrate things):


STIX Object Repo

indicator--1 ver 1, date_added 1999

indicator--1 ver 2, date_added 2000

indicator--1 ver 3, date_added 2001

indicator--1 ver 4, date_added 2002

indicator--1 ver 5, date_added 2003

indicator--1 ver 6, date_added 2004

indicator--1 ver 7, date_added 2005

indicator--1 ver 8, date_added 2006

indicator--1 ver 9, date_added 2007

indicator--1 ver 10, date_added 2008

indicator--1 ver 11, date_added 2009

indicator--1 ver 12, date_added 2010

indicator--2 ver 1, date_added 2011

indicator--3 ver 1, date_added 2012

indicator--4 ver 1, date_added 2013

indicator--5 ver 1, date_added 2014

indicator--6 ver 1, date_added 2015

indicator--7 ver 1, date_added 2016


Collection 1 Repo

indicator--1, date_added 1999

indicator--2, date_added 2011

indicator--3, date_added 2012

indicator--4, date_added 2013

indicator--5, date_added 2014

indicator--6, date_added 2015

indicator--7, date_added 2016


5) Now lets assume that the TAXII server is set to only send 5 objects. (this is done for illustration purposes)

6) When the client makes its first request with an added_after URL of 1998 the client will get the records indicator--1 ver 1 through indicator--1 ver 5 and the X-Headers will both be 1999.

7) The second request will get weird, if you add for the added_after value of 1999 you will get the same records again or you will skip the remaining versions of indicator--1.


You can obviously try and record the version (modified timestamp) in the collection table as well, but that will be an enormous amount of book keeping at scale and prone to error and problem.  The only way i can really see to solve this is to track the date_added by the time the object comes in to the collection, not the time it was actually added to the collection.  


So while we talked a few weeks ago about changing the text to say when an object was added to a collection, I think that might be in error.  Otherwise it gets ugly.  


From a scale standpoint, think of collections with 100 million records or more and where some objects may be revisioned a 10,000 times.  


Bret






[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]