OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

office-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Public Comment


Comment from: kindlund@mitre.org

Hello,

I'm in the process of looking at how to properly validate an OpenDocument XML file (written by OpenOffice), using the office-schema-1.0-cd-3.rng syntax as a basis. It seems I'm running into some schema validation errors (depending on the relax-ng validator used) and would appreciate any comments/suggestions. I'm not enrolled in the corresponding mailing list; if you could CC me directly with any responses, I would appreciate it.


Here are the process steps I followed:

- Downloaded OpenDocument schema from URL:
http://www.oasis-open.org/committees/download.php/11680/office-schema-1.0-cd-3.rng

- Downloaded the Sun Multi-Schema XML Validator (that supports Relax-NG schema validation)
http://www.sun.com/software/xml/developers/multischema/
as it was referenced as a valid Relax-NG validator on this page:
http://relaxng.org/#validators

- Created a simple OpenDocument using OpenOffice.org 2.0 Beta; extracted and aggregated the XML contents into a single XML file (Testing.xml).

- Ran the following command:
java -jar msv.jar -warning -strict office-schema-1.0-cd-3.rng Testing.xml

Output is as follows:
start parsing a grammar.

cannot set parameter pattern to this datatype: specified pattern is invalid: Unexpected meta character.
  3838:31@file:///C:/Temp/office-schema-1.0-cd-3.rng

invalid parameter setting: specified pattern is invalid: Unexpected meta character.
  3837:25@file:///C:/Temp/office-schema-1.0-cd-3.rng

cannot set parameter pattern to this datatype: specified pattern is invalid: Unexpected meta character.
  3843:31@file:///C:/Temp/office-schema-1.0-cd-3.rng

invalid parameter setting: specified pattern is invalid: Unexpected meta character.
  3842:25@file:///C:/Temp/office-schema-1.0-cd-3.rng

failed to load a grammar.

The X:Y numbers correspond to the row:column within the office-schema-1.0-cd-3.rng file.  In this case, it is referring to the "cellAddress" and "cellRangeAddress" element types, as explained on pages 189 and 190 in Section 8.3.1 of the corresponding documentation.

Specifically, this isn't a bug with the OpenDocument schema nor with MSV; it's a bug with Apache Xerces (v2.6.2) (since MSV leverages Xerces). Specifically, Xerces decides to treat the "$" and "^" characters in regular expressions as "metacharacters" (equivalent to Perl), whereas the W3C XML Schema datatype specification says they are not.  Details of this Xerces bug are listed here:
http://issues.apache.org/jira/browse/XERCESJ-1061

As a workaround, I modified the two regex's within the OpenDocument schema, by escaping the "$" that was the culprit ("$" -> "\$"); however, the Relax-NG schema validator is still yielding nonsensical error messages.

In fact, after trying 2 other independently-developed RNG validators; I'm getting inconsistent validator errors for each (Jing and oXygen). At this point, I'm skeptical to believe the OpenDocument schemas has flaws, based upon this evidence. The only common denominator I can see is that the "schema error" may revolve around the validators inability to handle the following two recursive definitions:

<define name="mathMarkup">
    <zeroOrMore>
        <choice>
            <attribute>
                <anyName/>
            </attribute>
            <text/>
            <element>
                <anyName/>
                <ref name="mathMarkup"/>
            </element>
        </choice>
    </zeroOrMore>
</define>

-AND-

<define name="anyAttListOrElements">
    <zeroOrMore>
        <attribute>
            <anyName/>
            <text/>
        </attribute>
    </zeroOrMore>
    <ref name="anyElements"/>
</define>
<define name="anyElements">
    <zeroOrMore>
        <element>
            <anyName/>
            <mixed>
                <ref name="anyAttListOrElements"/>
            </mixed>
        </element>
    </zeroOrMore>
</define>

I would appreciate any feedback/suggestions regarding this issue. If you happen to know of a better RNG validator for OpenDocument files, I could try and replicate the issue with that validator as well. I'm not an expert in RNG syntax in general, so if the problem is reproducible and there's an evident error within the schema, I would appreciate any explanations/clarifications.

Regards,
-- 
Darien Kindlund
The MITRE Corporation
InfoSec Engr / Scientist, Sr.

kindlund@mitre.org


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]