dita message

Subject: Re: [dita] Testing of RNG / DTD / XSD

From: Eric Sirois <esirois@ca.ibm.com>
To: Eliot Kimber <ekimber@contrext.com>
Date: Wed, 16 Apr 2014 11:41:17 -0400

I'll run the XSDs through some extra parsers tomorrow. There are some differences between Java and C based parsers. Xerces-C is stricter in some cases than it's Java based version. MSXML does not like redefine, but it sometimes finds some other issues.

Eric

Eric A. Sirois
Staff Software Developer
DB2 Universal Database - Information Development
DITA XML Schema Architect and DITA Open Toolkit Developer
IBM Canada Ltd. - Toronto Software Lab
Email: esirois@ca.ibm.com
Phone:(905) 413-2841
Blue Pages (Internal)

"Transparency and accessibility requirements dictate that public information and government
transactions avoid depending on technologies that imply or impose a specific product or
platform on businesses or citizens" - EU on XML-based office document formats.

Eliot Kimber ---04/15/2014 12:10:48 PM---The testing I've set up so far is in SVN under doctypes/test/tools. It currently consists of the fol

From: Eliot Kimber <ekimber@contrext.com>
To: DITA TC <dita@lists.oasis-open.org>
Date: 04/15/2014 12:10 PM
Subject: Re: [dita] Testing of RNG / DTD / XSD
Sent by: <dita@lists.oasis-open.org>

The testing I've set up so far is in SVN under doctypes/test/tools. It currently consists of the following components: 1. A set of valid map and topic base documents, at least one per shell, that should be valid against their corresponding shells. These are intended primarily to simply test that the shell document types are parseable (and by extension the modules they integrate) but also contain element and attribute instances that test edge cases or markup new to DITA 1.3 or cases that were incorrect in earlier versions that I wanted to regression test. 2. A set of invalid map and topic base documents, that should not be valid against their corresponding shells. This set is not yet complete--I've been adding to it as I've found valid cases were in fact not valid. I also don't yet have a way to automatically verify that the documents are correctly flagged as invalid (see below). 3. The Ant script build.xml in the doctypes/test/tools directory that does the following: A. Uses the base valid docs to generate grammar-type-specific instances for each grammar type (DTD, XSD URL-based, XSD-URN-based, RELAX NG-URL-based, RELAX NG-URN-based, RNC-URL-based and RNC-URN-based). B. DTD validation using the Ant <xmlvalidate> task. C. DTD validation using Saxon I need both forms of DTD validation because the <xmlvalidate> task fails completely if any DTD is not parseable, while Saxon does not. So between the two I seem to get complete coverage of parsing issues. D. XSD validation using the Ant <schemavalidate> target. This seems to accurately report issues with the XSDs, meaning I don't seem to be getting any false negatives with the currently-generated XSDs. For RNG/RNC the Jing tool, which is the only available Java-based RELAX NG validator that I know of, does not support catalogs, so it's not currently possible to validate the URN-based RNG/RNC documents, shells, and modules. I'm actively working on getting URL-based RNG/RNC validation set up through Ant but it's been a lower priority than XSD generation. It should just be a matter of calling jing from Ant using the Ant <java> task. I've also been doing "unit testing" via direct validation in Oxygen as I work through things, but I've been depending on my automated test suite to validate my work as I do it. I've also set up a Jenkins server on CloudBees that runs these same tests automatically any time anything new is committed under the doctypes/ directory in SVN. This automated regression testing is intended to be a check once the RNG-to-DTD/XSD/RNC is generally proven to be correct so that we can make adjustments to the vocabulary or tools and have them automatically verified in a publicly-visible place. I've set it up to send me email for any failure and it can easily be configured to send email to anyone, including the DITA TC mailing list if appropriate. That is, once we have all the tests passing for 1.3, I think it makes sense to have the automation inform the TC if anything breaks since any breakage would unexpected and bad. Right now the test does a simple log analysis looking for any error or warning message and considers the test to have failed if it finds any error or warning. This analysis could be more sophisticated but this is good enough for now. Additional testing that needs to be implemented include: - Testing correct detection of invalid documents. Unfortunately, with simple parsing plus log analysis for "error" there's no obvious way to treat invalidation of documents as success rather than failure. The approach I've been thinking of is to implement a Java class that essentially inverts the messages from a parser, reporting failure as success and success a failure. This should be easy enough to do. It might also be possible to do it directly in Ant by capturing a log using an Ant log recorder and then applying a different regular _expression_ to the log or transforming the log via text replacement and injecting the result into the final log analyzed. - Adding additional test case documents that exercise more cases and therefore provide more detailed checks of specific content models. These documents are tedious to author. - Implement schematrons that validate the RNGs themselves--George Bina implemented some at the start of his work on DITA RNG but the RNG details have changed since then and I haven't been able to update the schematrons to match. But the schematrons can check many details, including specialization requirements (correct specialization hierarchy, @domains values, etc.). - Use the OT preprocessing as an additional check where the schematrons are not sufficient (not sure what that might be). Cheers, Eliot ————— Eliot Kimber, Owner Contrext, LLChttp://contrext.comOn 4/15/14, 9:43 AM, "Kristen James Eberlein" <kris@eberleinconsulting.com> wrote: > > > > > > > > > > > > Subject: > > Testing of RNG / DTD / XSD > > > Date: > Tue, 15 Apr 2014 08:25:53 -0500 > > > From: > Robert D Anderson <robander@us.ibm.com> ><mailto:robander@us.ibm.com> > > > To: > Eliot Kimber <ekimber@rsicms.com> <mailto:ekimber@rsicms.com>, > chris.nitchie@oberontech.com, dhelfinstine@ptc.com, Scott > Hudson <scott.hudson@schneider-electric.com> ><mailto:scott.hudson@schneider-electric.com> > > > CC: > Kristen James Eberlein > <kris@eberleinconsulting.com> ><mailto:kris@eberleinconsulting.com>, Eric Sirois > <esirois@ca.ibm.com> <mailto:esirois@ca.ibm.com> > > > > > > I was supposed to start this > thread last week but fell down on that. Better late than > never. > > We need to discuss how we want > to handle testing of the RNG -- what policy do we want to have > in place, who is doing the work, and (if needed) how can that > work be repeated by others. I think it's important that > whatever our Official Process becomes, anybody should be able > to set it up and repeat it with minimal work. That was not the > case with DITA 1.2, where everything relied on a long series > of tools and scripts on my own system. > > High points - here are the > things I did while testing with 1.2: > * Kept an XML rendering of each > doctype > * For each new feature: > ** Integrate the new change > ** Verify that the DTD still > parsed (run through my generally very picky Omnimark parser, > open in a validating editor) > ** Verify that the desired new > markup was there > ** Regenerate the XML version > of the DTD, and do a diff to ensure no unintended changes > * Repeat for each new feature > > For 1.3 I think Eliot has > already been doing some of this - validating with parsers, and > ensuring that the new markup is available. > > Do we want to keep up the "make > sure no unintended consequences" test, and if so, how? I think > this is much more difficult with doctypes that are already > essentially complete (it's easiest when checking as each > feature is added). > > Do we have tools that can do > other DITA based validation -- ensure that the specialization > is correct, maybe catch a Learning and Training element that > has an incorrectly constructed class attribute, etc? > > Who here wants to sign up for > testing of RNG, DTD, or XSD? We don't want to be wasting time, > but it might be a good thing if we're doing some of this > testing with different parsers, for example -- I've found > things in the past that opened OK in Arbortext, while Omnimark > threw out an error, or vice versa. > > Thanks, > > Robert D Anderson > IBM Authoring Tools Development > Chief Architect, DITA Open Toolkit >(http://dita-ot.sourceforge.net/) > > > > > > > >--------------------------------------------------------------------- >To unsubscribe from this mail list, you must leave the OASIS TC that >generates this mail. Follow this link to all your TCs in OASIS at: >https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php> > --------------------------------------------------------------------- To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail. Follow this link to all your TCs in OASIS at:https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php

References:
- Testing of RNG / DTD / XSD
  - From: Kristen James Eberlein <kris@eberleinconsulting.com>
- Re: [dita] Testing of RNG / DTD / XSD
  - From: Eliot Kimber <ekimber@contrext.com>