dita message

Subject: XSDs--Generating Them Or Not--Some Background

From: Eliot Kimber <ekimber@contrext.com>
To: DITA TC <dita@lists.oasis-open.org>
Date: Mon, 25 Jun 2018 10:50:35 -0500

I certainly agree that not shipping XSDs of any form in DITA 2.0 is a good thing.

Some background on XSDs, DITA, and current state of generation technology.

1. XSD 1.0 doesn't work with DITA modularity

The issue here is the need to use the XSD redefine feature in order to implement extensions from specializations and constraints. There are two problems with the redefine feature:

A. It is ambiguously defined in the 1.0 XSD spec and different processors implement it differently. Of the two possible interpretations of how redefine works, only one works for DITA. This happens to be the way Xerces implements its XSD processing but there may be other XSD processors that use the other interpretation (I want to say that the Microsoft implementation uses the other interpretation but I don't know that that's actually the case). So basically, the XSDs work *if* you use Xerces as your parser, or a parser that behaves the same way. Since Xerces is the parser used more or less exclusively in Java this means 95% of all DITA users are fine, but....

B. The rules for constraint using redefine require a particularly convoluted approach to extension for some types of content models (i.e., constraints applied to sequence models where you want to disallow some items in the model--strict task is a good example). Creating these multi-stage extensions requires creativity that was certainly beyond my ability to automate, at least with the amount of time I had to devote to the problem. In particular, there is not a simple one-to-one mapping from the RNG to the equivalent XSD as there is for DTDs. This means that there is a class of constraints where XSD generation cannot be automated (at least not with the current RNG-to-XSD code). For the OASIS grammars I simply hard coded the generation of the XSD constraint modules but that's obviously not a general solution for non-OASIS constraints and specializations.

2. The XSD 1.1 override feature allows direct override of content models in a way that is directly analogous to RNG and DTD overrides. This feature was driven in part by requirements from DITA. Xerces and Saxon both implement XSD 1.1 features (I know Saxon does--would need to verify Xerces support). I haven't tried it but I'm pretty confident that we could generate XSD 1.1 overrides for constraints and specializations without too much work. However, since support for XSD 1.1 is not universal, I've never suggested it.

3. The current RNG-to-XSD code works (that is, I haven't broken it or removed it from the current RNG-to-* generator code) as far as it goes, so people who want modular XSDs for DITA 2.0 grammars can always generate them themselves (or, more likely, we (I) can generate them and put them somewhere as a convenience to the community (e.g., in the DITA Community project on GitHub or in a non-normative OASIS repository).

Generating monolithic XSDs may be easy or hard--I'm not sure because I haven't tried it or thought through the problem. If someone wanted to pursue that with the current RNG-to-XSD code as a base I'd be happy to support them but I don't think it's something I'm likely to pursue on my own.

For editors like Xeditor and Fonto that currently use XSD as their grammar source, I would think it might be easier for them to simply move to using RNG but I don't really know. I've certainly suggested strongly to both companies that moving to RNG would be a good thing for them to do.

Cheers,

Eliot

--
Eliot Kimber
http://contrext.com