OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

cti-taxii message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [cti-taxii] Pagination in TAXII 2.0


Bret,

 

Has consideration been given to using the Link header, per RFC 5988, instead of the range/Content-Range header approach.

For Example:

Link: <https://api-root/collections/investigations/objects?last_id=indicator--c410e480-e42b-47d1-9476-85307c12bcbf&length=20>; rel="next", <https://api-root/collections/investigations/objects>; rel="first"

 

You might look at how GitHub uses this approach for pagination at https://developer.github.com/v3/guides/traversing-with-pagination/

 

 

Paul Patrick

 

 

From: <cti-taxii@lists.oasis-open.org> on behalf of Bret Jordan <Bret_Jordan@symantec.com>
Date: Thursday, December 7, 2017 at 1:27 AM
To: "cti-taxii@lists.oasis-open.org" <cti-taxii@lists.oasis-open.org>
Subject: [cti-taxii] Pagination in TAXII 2.0

 

All,

 

I have some concerns with how we designed pagination in TAXII 2.0.  The whole range aspect which is basically a limit / offset design only works well if your collections are small.  Once your collections get to a certain size, regardless of the database you use (MySQL, Postgres, Oracle or even a NoSQL), the performance impact is huge and the wheels come off the bus. Apparently this is very well known issue with REST APIs and large web2.0 companies basically say, never use range (limit/offset) for REST APIs. 

 

Now there are several ways that you can try and get around this problem (like caching data, using pages, using cursors, using result tables and doing in-memory ranges), but non of them are very good and represent a lot of unnecessary hackery.  After doing a lot more reading and trying to implement this in various ways with various database backends, I am thinking that there is a better way. This solution would be pretty simple to implement and would greatly improve performance. 

 

I propose that we drop our pagination design and just use added_after with some limit value.  This would represent very little change to the overall architecture, other than dropping some sections and rewording a few normative statements. From a code stand point, it would be MUCH easier to implement.

 

A client could then just say, "Server give me all records after 2016".  The server could say "Hey client, you can have 100 records from 2016-01-01 and the last record I am sending you is 2016-03-01".  The server could also optionally tell the client that there are 20,000,000 records in the collection.  

 

If the client did not give an added_after filter, the server could just give the client the latest records that were written to the collection, up to the limit size that the client and server both support. 

 

From a performance standpoint, this works a lot better and results in a lot less latency.

 

Bret

 

 

 

 

 

This email and any attachments thereto may contain private, confidential, and/or privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments thereto) by others is strictly prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete the original and any copies of this email and any attachments thereto.


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]