cti-taxii message

Subject: Re: [cti-taxii] Pagination in TAXII 2.0

From: John-Mark Gurney <jmg@newcontext.com>
To: Bret Jordan <Bret_Jordan@symantec.com>
Date: Mon, 11 Dec 2017 16:22:31 -0800

Bret Jordan wrote this message on Thu, Dec 07, 2017 at 06:27 +0000:
> I have some concerns with how we designed pagination in TAXII 2.0.  The whole range aspect which is basically a limit / offset design only works well if your collections are small.  Once your collections get to a certain size, regardless of the database you use (MySQL, Postgres, Oracle or even a NoSQL), the performance impact is huge and the wheels come off the bus. Apparently this is very well known issue with REST APIs and large web2.0 companies basically say, never use range (limit/offset) for REST APIs.
> 
> 
> Now there are several ways that you can try and get around this problem (like caching data, using pages, using cursors, using result tables and doing in-memory ranges), but non of them are very good and represent a lot of unnecessary hackery.  After doing a lot more reading and trying to implement this in various ways with various database backends, I am thinking that there is a better way. This solution would be pretty simple to implement and would greatly improve performance.
> 
> 
> I propose that we drop our pagination design and just use added_after with some limit value.  This would represent very little change to the overall architecture, other than dropping some sections and rewording a few normative statements. From a code stand point, it would be MUCH easier to implement.

This sounds similar to how twitter handles their paging:
https://developer.twitter.com/en/docs/tweets/timelines/guides/working-with-timelines

The biggest issue I see w/ added_after is what do you consider added?
Is a new version added after?  If so, then this is likely a good
solution...

As someone said in another email, as long as the returned items are all
before the earliest not returned item, and that no NEW item can be
added before the last returned item, then it really doesn't matter.

I just had a thought, do we need to enshrine this in the spec?  Can we
just provide guidence on how it should behave?  Because if an
implementor's db uses an integer incrementing ID, that can easily be
used, but if you don't, then an implementor may need to add additional
data to the database to handle this case.

-- 
John-Mark

References:
- Pagination in TAXII 2.0
  - From: Bret Jordan <Bret_Jordan@symantec.com>