oiic-formation-discuss message

Subject: Reference implementation strikes gold

From: jose lorenzo <hozelda@yahoo.com>
To: oiic-formation-discuss@lists.oasis-open.org
Date: Sun, 15 Jun 2008 22:29:22 -0700 (PDT)

[Reference that may be useful for adding context to: writing "twists" into documents, "cheating", and other parts below: http://lists.oasis-open.org/archives/oiic-formation-discuss/200806/msg00265.html ]

Rob & co, I've managed to read a large amount of the archived emails, and after following references have a much better feel for what is going on here and is being attempted. Only wish I had read quicker earlier in the week.

[There are many useful links to follow that have been posted, but one particularly nice one, in conjunction with all the discussions, was the talk you gave not too long ago titled: "ODF Interoperability: The Price of Sucess". I forgot which email offered the link to the pdf, but the linked pdf offers a nice high level view with an optimistic bent.]

There is one thing I haven't seen mentioned yet in what I have read. The reference implemention (eg, OO.o as you stated, but whatever it is) is not only so that people can see the source to study it or to provide some error redundancy to the spec and test suites. Very importantly, the reference implementation serves as THE tool that people should all have in order to display (doublecheck) their ODF documents made with their vendor's tools. This process should eventually be automated (through contributed scripts) to increase the likelihood users will make testing of their documents virtually automatic (Save + Test). [This important point continues below.]

I am very concerned about all the possible and likely present and future abuses by the company (or any such future company) that has a lock on the very important desktop office suite market. That has been the motivation for me to have started participating on this mailing list. I feel that it's very bad to certify a poison as good for your health. At that point, guards are let down and consumption drastically increases (vs. when people were unsure). In this same way, I'd hate for the public to be under the impression that any particular closed source product was certified as safe for consumption (let your guards down) because it passed some test or other. This is especially a likely threat when the product belongs to the widely distributed MSO family of products.

Anyway, this workgroup seems to be on track to continue to improve ODF and the whole standards situation. This much is clear to me from all the reading I've been doing. I very much agree that it's important to make the testing an ongoing process and be as comprehensive as possible as this increases the chances that selective cheating will become expensive to maintain and lead to constant tarnishing of the cheater's brand (at least if users pay attention). This is a good practical way forward. [Yes, the interop goal is not just about catching the cheater, I know.]

Now, back to the OO.o as reference implementation. It's in every vendor's interest to always be able to read all other formats. There may be strategic exceptions, but those would not be the norm. If everyone else follows standards accurately, this makes it easier for a single or small number of vendors to cheat. In this case, they would be able to read others but would not want others to read their format secret twists. To be more accurate, this behavior would be expected from a player that already "owned" a significant amount of the market because that player could get their flavor quickly disseminated so that, no matter the violation of the standard, users would quickly find that this was the dominant flavor/std and would appear to interoperate with "everything" (since "everything" would be owned by that vendor).

So reading is likely to be done faithfully (or attempted) most of the time by a dominant vendor, by any vendor. A random series of tests, which will very possibly cover only a tiny amount of the whole test space, and something like an acid test document (such as the smiley face example) are thus likely to work from within the reader applications from anyone trying to cheat.

Meanwhile, any special "twists" to the standard will be incorporated into outputted/written documents from that same application. Test documents on the offending app won't catch this. Conformance tests on these documents also can be beat rather easily and likely, at least within some "bug" factor [see the link above]. They won't catch what is by far the most likely source of abuse: strictly conforming documents that pass all (superficial) conformance tests but encode the vendor's special twists. If we knew the secrets to the twists, we could construct such test documents (and I do expect the open source world to look for it and help build such tests). But initially we don't know the secrets, and random test documents covering a small part of the test space won't run into these secrets except very rarely if the secrets are well designed/hidden.

There is hope.

What does work, and very well, is to use the reference implementation to test all docs produced from vendor applications. We all know that this would be a good idea anyway, but I want to emphasize that, as a defense against the likely form of abuse described above, the ref impl is the key player by far (vs. acid test documents of the type described, schema tests, etc, that help genuinely cooperative vendors improve their products). Any document that "looks" AND HANDLES differently when opened within the reference implementation as compared to when opened from within the vendor's product signals that a document has been produced that skirts the methods provided by ODF to create the various document parts (again, read the linked email above for an example of this scenario; do note, that such bad documents could easily be strictly conforming).

In short, using a hopefully consistent and solid standard, the test documents and schema tests help those cooperating, while the reference implementation is particularly useful for catching the likely abuses.

Tests can and should be made to run from the ref imple to automate the process and provide (eg) extensive HANDLING coverage. This can be achieved, for example, through macros and "testing" plugins incorporated into the reference implementation (as well as through external driver scripts) and then applied to any/all documents from vendor X (to test X's app).

The acid test proof-of-concept recently submitted (great, Sam) uses (I believe) a spreadsheet with cells full of conditional formatting. The other side of the coin, is to build the logic into macros (vs. cells) to be applied on arbitrary documents to catch those written using proprietary extensions being used gratuitously in lieu of the mechanisms defined within the ODF standard itself. [Link at top helps explain this scenario.]

[So through automated extensive testing from within the reference implementation on all externally created documents, we help prevent documents from existing that will do all the useful stuff only when opened from a single vendor's product. We do this, by identifying the offending producer. Further, by studying the offending documents, we may be able to find the secret protocols used to consume/produce it and hence derive not-so-random test documents that could then be directly used to identify the misbehaving application errors (or "errors") with a higher frequency, perhaps helping us gain a better feel for the extent of the problem.. was it isolated or does it appear to be a dedicated effort to bypass ODF with documents that to the outside world appear trivially and uselessly strictly conforming? If the cases appear isolated, then we can chuck this up to a bug on the part of the vendor. And in any case, the ref impl helps gather data points useful for
reverse engineering compat with the offending (potentially dominant) product.]

PS: If the macro and plugin testing concept from within the ref imple wasn't clear.. What I mean (or one possibility for a round of tests) is that the macros could run so that the document is exposed in various ways and the user then gets to decide if that behavior is correct (as expected). So this could definitely involve the user's subjective call, but can possibly be made as simple as the smiley acid test.

PS2: Also, let me contrast quickly my understanding of the smiley test vs. this approach. The smiley test tests app X as a reader to renderer. The test is to check if the app under the microscope reads and renders accurately what is known to be an accurate document. Meanwhile, the approach focused on in this email uses as the control the reference implementation and as the variable the documents written (produced) by the third party app. So the roles of the control and variables are switched (doc vs. app). App X is tested as a producer, indirectly, via the quality of the document it produced that is being tested. In both cases, however, the control (whether the smiley doc or the ref imple app) is where the test logic is incorporated. And of course, we expect that the control (doc or app) is accurate, or otherwise then deal with confidence levels below 100%.

I believe all the coding being done for the smiley acid test (see http://lists.oasis-open.org/archives/oiic-formation-discuss/200806/msg00323.html ), are ultimately to test the accuracy of the rendering of the control document. Someone, please correct me (and this posting) if I have misunderstood.

Follow-Ups:
- Re: [oiic-formation-discuss] Reference implementation strikes gold
  - From: jose lorenzo <hozelda@yahoo.com>