Subject: RE: binary formats
For efficient representation of tabular data with CORBA GIOP, in the past we used column-major order rather than row-major order. Fast, simple, and efficient, and we’ve been using it successfully for nearly 20 years.
See in the attached ZIP, a Column includes an attribute of type Data, which uses a discriminated union selecting one of a set of possible sequence types. The basic idea is that column metadata (like name, type) occurs once per column, rather than once per cell. And the actual column data is just a sequence of primitive values.
Anyway, just the other day I was considering mapping this to OData CSDL, where rather than a union type, Data could be an abstract complex type, with various subtypes each of which contains a field whose type is a collection of primitives.
Anyway, that’s just some background, if we want something like Java ResultSet / .NET DataTable encoded in CSDL.
Now rather than defining a new set of data types, and just want to make a more efficient (JSON) encoding for tabular data, we could just consider allowing column-major order as an alternative encoding for a collection of entity values (or collection of complex values).
So where now we may have (omitting metadata) a list of entities:
We could instead have a list of entities encoded like this:
If we need column-specific meta-data, just add a an field in the object, with a collection of objects each of which contains column meta-data, e.g.
I am sure this could be the starting point for a workable proposal.
All relevant browsers have built-in GZIP (de)compression and JSON.parse(), so gzipped JSON is the baseline. A binary format that wants to compete has to be faster end-to-end, i.e. the time saved by sending less over the wire has to overcompensate the time invested to pack and unpack the data. These considerations obviously went into the choice of GZIP as the predominant compression method, see http://tukaani.org/lzma/benchmarks.html for a comparison of compression algorithms in terms of speed and memory footprint.
A more compact JSON format for “tabular” data as initially proposed in http://www.odata.org/blog/an-efficient-format-for-odata/ might help, but OData’s flexibility with $expand, dynamic properties, and inheritance don’t make that as straight-forward as it might seem. We’ve experimented with “Recursive JSON”, http://www.cliws.com/e/06pogA9VwXylo_GknPEeFA/, which addresses the problem of long property names and deals with the flexible structure. Combined with omitting properties with default value, https://issues.oasis-open.org/browse/ODATA-818, this might take us far.
Thanks in advance!
I am not so sure of the shift to binary formats that you predict.
I did some “real data” comparison of SAP enterprise data using JSON vs. BSON format, and found BSON to take up more space. Binary formats that encode lengths can chew up more space than text-based formats. (Similar to CORBA GIOP, where you have a bunch of fixed-length (32-bt) length fields, as well as padding, ibn the binary encoding).
Now for some ODaata services I am toying with, HTML is the default format. Not a standard format for OData, but very useful for system administration tasks like system monitoring (logs, metrics, etc).
I think if the client doesn’t send an Accept header or specify $format in a URL, then all bets should be off. I suppose I am saying let the server decide a default, if even it has one rather than insisting on finding format in URL or Accept header.
Typically we say that the default is up to the server. The server only needs to support one of the standardized serialization formats – but since JSON is the only format that is fully standardized at this point, we typically expect JSON to be the default response.
I can also say that I *hope* it’s not put down somewhere in the standard as the default. I personally believe we’re at the height of the maturity curve for JSON, and I think as HTTP debugging tools continue to rapidly improve, that we will see a shift in the default serialization format from JSON to a binary format, similar to what we’re seeing happen with HTTP2. So in my personal ideal future state, we would see something like Avro take over as the default serialization format for OData payloads – but of course that depends upon the ability of the server to choose the right default for the API.
I know that the default format for services in OData v4 changed to be JSON, but I am having difficulty in the spec finding where it explicitly states that. I would have expected to find something in the definition of the accept header and the $format query parameter. Something along the lines of if the accept header and $format query parameter are not present then the JSON format is used.