I thought a bit more about this, and after talking with Allan, I think the limit on the server side will probably be variable based on the size of the content. So we might need to tweak the text and make sure that the max_content_length works in both directions.
For example, say if normally a server limits a client to 500 objects at a time. But say some of those are really big, like a gig in size. The server may need to dynamically change the limit based on the size of the objects. So a client would
need to always check the envelope to see if there are more records.
As I was reading this proposed solution for TAXII Pagination. It occurred to me that currently a TAXII Server does not have a way of advertising his self-imposed limit for pagination requests. This way, a client can also know ahead of time its limit via the
server api_root resource. This more of a different problem than the originally expressed in this thread, but related.
What I propose is adding a new property called max_limit
and you can read the details
I like this approach and believe itâs a small flexible change that lets a client consume data in page sizes they want regardless of what backend the target server uses.
Looking forward to feedback from others.
In TAXII 2.1 we have a pretty good pagination solution, but it suffers from a known issue when multiple records have the same date added value. We originallytried to
address this by saying that the date added value MUST be microsecond level precision. But that is not sufficient for some.
As such, I have been working with Looking Glass on a potential solution that requires the least amount of changes to make this work. After many back-and-forth versions, I think we have something that might work. Please
TAXII Pagination Proposal
To keep things simple, for mental visualization, we will be defining the scenarios in terms of small numbers. But one must realize that in production, these numbers will be many orders of magnitude larger.
1 Fundamental Design Goals
Completely stateless for the server in the true RESTful sense
Simple way for clients to start synchronization after some point in time, without having to sync the entire collection.
Example: A collection may have billions of records in it going back 10 years. But a client really only cares about syncing or getting data from the past 6 months.
Need ability to paginate records where every record has its own date_added value
Need ability to paginate records where many records may have the same date_added value
2 Proposed Solution Summary
Add a single optional property called "next" (type: string) to the TAXII Envelope
Add a URL parameter called "next"
The collection has 200 indicator records, however, the first 100 records all have the same date_added timestamp
Our current method breaks if and only if, the client has a limit of less than 100 or the server artificially limits the records to less than 100. Under this condition the client will not get all of the records or will
have inconsistent experience.
3.2 Example Initial Request From Client
3.3 Server Processes Query Request
The server queries the datastore with a record limit of 21 records (client provided or server limited limit value + 1) that match the rest of the request
The server checks results to see if there are 21 records returned.
If NO then there are no more records that match the query and the TAXII server can send the results in a TAXII envelope to the client
TAXII Envelope "more" property set to "false"
TAXII Envelope "next" property is left empty
If YES then there are more records and the server would respond with the following
TAXII Envelope "more" property set to "true"
TAXII Envelope "next" property set to a string value. For a relational database this could be the index autoID, for elastic search it could be the Scroll ID, for other systems it could be a cursor ID, or it could be any
string (or int represented as a string) depending on the requirements of the server and the black magic it is doing in the background. The key is that it is something that the server knows how to deal with and process and the client only needs to send it back
to the server in the next request to get more data.
3.4 Example Follow On Request From Client
If we can verify that this does solve the issue, and is still easy to implement (I believe so) this is something that we could do for TAXII 2.1, if the TC agrees. Yes it would require another CSD and Public Review, but
it would allow us to address this last known issue.