OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

dita message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [dita] Conref range: Is constraint on last member of rangenecessary or useful?


Based on Tuesday's discussion and the general feeling that the current
constraint is justified in order to help ensure validity and what I'll call
"sensibility", I've drafted this paragraph for my book's discussion of
conref range:

Note that one could construct specialized or constrained content models that
would allow referenced ranges to be invalid. It would be a challenge to do
so but it could happen. For example, if the content model were (a, b, a, c,
a, d), meaning a sequence of required elements that alternated the element
type "a" with other types, you could reference the sequence "a", "b", "a",
from the second "a" in the referencing topic, resulting in the effective
element sequence "a", "b", "a', "b", "a", which would not be schema valid.
There are no content models of this form in the base DITA vocabulary but
nothing prevents someone from defining one in a specialization or constraint
module. But it's such an odd pattern that it is highly unlikely anyone would
define one except for the most specialized and task-specific applications.
Remember too that validity doesn't just mean what the DTD or schema
enforces, it means what the processors that operate on the data do or don't
accept. If trapping this case and reporting it is important you can always
use separate validation applications like Schematron or custom code to check
for this case. The intent of the DITA specification is clear even if DTDs
and XSDs are not able to fully express the constraints in a way that
general-purpose XML processors will enforce them.

I think this statement is in line with what we said we might need to post as
one of our clarify statements.

Cheers,

E.

On 2/11/11 5:54 AM, "Eliot Kimber" <ekimber@reallysi.com> wrote:

> I think you have misunderstood when a content model is or is not ambiguous.
> 
> In the case of the sequence (a,b,a,c,a,d), given an initial <a> element
> there is no question that it must match the first "a" token in the content
> model, then you must have a b, then another a, and so on. Because there are
> no choices here, the content model is deterministic (not ambiguous).
> 
> But, in any case, XML does not *require* reporting of non-deterministic
> content models: "XML processors built using SGML systems *may* flag
> non-deterministic content models as errors." (Emphasis mine.)
> 
> In point of fact, I don't believe any XML parsers in common use report
> non-deterministic models, at least by default, simply because there is no
> need to do so because they can all validate against such models just fine.
> 
> I tried the experiment of making my sample ambiguous, e.g., by changing it
> to:
> 
> <!ELEMENT foo ((a, b) | (a, c)) >
> 
> And Xerces (through Oxygen) correctly reported my document as invalid
> (because it no longer satisfied the content model) but it did not complain
> about the content model itself.
> 
> I tried editing the document with the ambiguous content model in Arbortext
> Editor 5.4 because if there's any processor that would report ambiguous
> content models I would think Editor would given it's SGML heritage, but it
> did not.
> 
> So even if a dependence on non-determinism would help in this case (which I
> don't see that it actually does) you couldn't rely on parsers reporting it.
> 
> But in any case, my original example is correct as written in that the
> content model is not non-deterministic, so I think my argument stands:
> requiring matching end types in ranges can't prevent non-DTD-valid results
> following resolution and therefore there's no point in making the
> requirement.
> 
> Cheers,
> 
> Eliot
> 
> On 2/10/11 9:21 AM, "Michael Priestley" <mpriestl@ca.ibm.com> wrote:
> 
>> 
>> Hi Eliot,
>> 
>>> Determinism only applies when there is optionality.
>> 
>> Optionality is merely one way to create indeterminacy.
>> 
>> From the URL I sent:
>>> given an initial b the XML processor cannot know which b in the model is
>>> being matched without looking ahead to see which element follows the b.
>> 
>> That certainly describes your example below.
>> 
>> Michael Priestley, Senior Technical Staff Member (STSM)
>> Lead IBM DITA Architect
>> mpriestl@ca.ibm.com
>> http://dita.xml.org/blog/25 <http://dita.xml.org/blog/25>
>> 
>> 
>> From: Eliot Kimber <ekimber@reallysi.com>
>> To: Michael Priestley/Toronto/IBM@IBMCA
>> Cc: dita <dita@lists.oasis-open.org>
>> Date: 02/10/2011 10:06 AM
>> Subject: Re: [dita] Conref range: Is constraint on last member of range
>> necessary or useful?
>> 
>> 
>> 
>> 
>> Determinism only applies when there is optionality. Consider this DTD:
>> 
>> <!ELEMENT root (foo) >
>> <!ELEMENT foo (a, b, a, c, a, d) >
>> <!ELEMENT a (#PCDATA)* >
>> <!ATTLIST a id NMTOKEN #IMPLIED >
>> <!ELEMENT b (#PCDATA)* >
>> <!ATTLIST b id NMTOKEN #IMPLIED >
>> <!ELEMENT c (#PCDATA)* >
>> <!ATTLIST c id NMTOKEN #IMPLIED >
>> <!ELEMENT d (#PCDATA)* >
>> <!ATTLIST d id NMTOKEN #IMPLIED >
>> 
>> 
>> And this valid instance:
>> 
>> <!DOCTYPE root SYSTEM "sequence-test.dtd">
>> <root>
>>   <foo>
>>     <a id="a1"></a>
>>     <b id="b1"></b>
>>     <a id="a2"></a>
>>     <c id="c1"></c>
>>     <a id="a2"></a>
>>     <c id="c1"></c>
>>     <a id="a3"></a>
>>     <d id="d1"></d>
>>   </foo>
>> </root>
>> 
>> The element type <a> is allowed in three places, once followed by <b>, once
>> followed by <c>, once by <d>. From a referencing topic I could do this:
>> 
>> <foo>
>>   <a conkeyref="sequence-test.dtd/a2"
>>      conrefend="x#x/a3"
>>  />
>>  <b/><a/><c/><a/><d/>
>> </foo>
>> 
>> The referenced range is not DTD valid in the referencing context (and in
>> fact in this example there is no possible referenced range that would be DTD
>> valid since the content model has no option members).
>> 
>> Since creating invalid reference results is not (and cannot be) disallowed,
>> it must be allowed. Requiring that the last member be <a> in this case
>> doesn't make the result any *more* valid nor does it make it less valid.
>> 
>> But the reference is correct per the conref constraints.
>> 
>> Again, in this example, why should I be disallowed from referencing the
>> sequence <a/><b/>?
>> 
>> So while this may not be a likely case it is a possible case and it
>> demonstrates that imposing a requirement on the last member of the sequence
>> doesn't help ensure sensibility or DTD validity of the result.
>> 
>> In practice you would expect to only use conref range in the context of
>> parent elements with repeating OR groups but that itself is not a stated
>> requirement of the facility in DITA 1.2. But even in that case, requiring a
>> specific sequence end doesn't make much sense since in the case of repeating
>> OR groups all valid members of the group will always be valid wherever they
>> occur, so again, requiring a specific end element doesn't appear to help
>> (because validity is guaranteed in the repeating OR group case).
>> 
>> We can break the possibilities down as follows:
>> 
>> 1. Parent content model is a repeating OR group: no possible sequence of
>> siblings that satisfy general conref constraints can be invalid. No need to
>> constrain any node of sequence. Validity of referenced result is ensured by
>> normal same-or-more-specialized type requirements on referenced elements
>> (including parent of referenced elements).
>> 
>> 2. Parent content model is a sequence where first item in the referenced
>> sequence is required wherever it occurs (example shown above). In this case,
>> validity of the referenced result cannot be guaranteed in any case.
>> Constraining the last node of the sequence cannot help, as shown above.
>> 
>> 3. Parent content model is a sequence of elements where first member of the
>> referenced sequence is optional in some cases. Determinism rules disallow
>> construction of naïve content models but any non-deterministic content model
>> can be rewritten as a sequence of sequences that reflect all possible
>> combinations of elements. Therefore this case resolves to case 2. Again,
>> constraint of last member cannot help.
>> 
>> So I don't see how the last member constraint can ever help and I can think
>> of cases where it gets in the way. Thus it appears to be an unnecessary
>> requirement that requires content model design that wouldn't otherwise be
>> required and that does not satisfy the intended goal, namely ensuring
>> sensibility of the conref result.
>> 
>> Cheers,
>> 
>> E.
>> 
>> 
>> On 2/10/11 7:25 AM, "Michael Priestley" <mpriestl@ca.ibm.com> wrote:
>> 
>>> 
>>> Hi Eliot,
>>> 
>>> Re:
>>>> 2. It is possible to define sequence content models that allow a given type
>>>> to occur in multiple places within the sequence but that allows different
>>>> following siblings.
>>> 
>>> I don't believe that's true. See:
>>> http://www.w3.org/TR/2000/REC-xml-20001006#determinism
>>> <http://www.w3.org/TR/2000/REC-xml-20001006#determinism>
>>> <http://www.w3.org/TR/2000/REC-xml-20001006#determinism
>>> <http://www.w3.org/TR/2000/REC-xml-20001006#determinism> >
>>> 
>>> The current design requires matching start/end elements explicitly to
>>> leverage
>>> determinism.
>>> 
>>> Michael Priestley, Senior Technical Staff Member (STSM)
>>> Lead IBM DITA Architect
>>> mpriestl@ca.ibm.com
>>> http://dita.xml.org/blog/25 <http://dita.xml.org/blog/25>
>>> <http://dita.xml.org/blog/25 > >
>>> 
>>> 
>>> From: Eliot Kimber <ekimber@reallysi.com>
>>> To: dita <dita@lists.oasis-open.org>
>>> Date: 02/10/2011 08:03 AM
>>> Subject: [dita] Conref range: Is constraint on last member of range
>>> necessary
>>> or useful?
>>> 
>>> 
>>> 
>>> 
>>> I'm writing up my explanation of conref range for my book and in explaining
>>> the rule that the first and last elements of the range must be the same type
>>> but intermediate members need not not be, it occurs to me that there's
>>> really no point in having the constraint on the last member of the range.
>>> 
>>> Since I obviously didn't think about this too much at the time the mechanism
>>> was proposed, I'm wondering if there was more thinking behind the constraint
>>> than is evident from the language of the spec itself.
>>> 
>>> My questioning of the value of the constraint comes from this analysis:
>>> 
>>> 1. The requirement that the referencing and referenced elements have
>>> compatible parent elements ensures that the start element of the range is
>>> valid in the referencing context.
>>> 
>>> 2. It is possible to define sequence content models that allow a given type
>>> to occur in multiple places within the sequence but that allows different
>>> following siblings. This means that the referencing element could refer to a
>>> range that is inconsistent with the sequence rules in the referencing
>>> context. Since this case is not explicitly disallowed, it must not be a
>>> concern. This means that strict DTD validity of the conref result cannot be
>>> ensured in the general case and there is no general requirement to ensure
>>> it.
>>> 
>>> Likewise, since there are not constraints on the intermediate members beyond
>>> common parentage, there must be no general concern about DTD validity of the
>>> resolved result.
>>> 
>>> 3. Given (2) it can't possibly help to require the last member of a sequence
>>> to be the same as the start since it cannot make the result more valid.
>>> 
>>> 4. Requiring that the start and end of the range be the same disallows use
>>> of conref range for referencing sequences where the content model does not
>>> allow the initial type to occur at the end of the range.
>>> 
>>> For example, say you have a specialized topic type that defines a set of
>>> distinct specializations of <section> and puts them in a specific order. It
>>> would be impossible to use conref range to re-use the sequence of sections
>>> from another topic of the same type even though the result must be DTD
>>> valid.
>>> 
>>> Thus, the requirement seems to be both unnecessary (it doesn't help ensure
>>> correctness or sensibility of the conref result) and it disallows legitimate
>>> cases.
>>> 
>>> Perhaps for 1.3 we should consider removing this constraint.
>>> 
>>> Cheers,
>>> 
>>> E.
>>> 
> 
> --
> Eliot Kimber
> Senior Solutions Architect
> "Bringing Strategy, Content, and Technology Together"
> Main: 512.554.9368
> www.reallysi.com
> www.rsuitecms.com
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe from this mail list, you must leave the OASIS TC that
> generates this mail.  Follow this link to all your TCs in OASIS at:
> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
> 

-- 
Eliot Kimber
Senior Solutions Architect
"Bringing Strategy, Content, and Technology Together"
Main: 512.554.9368
www.reallysi.com
www.rsuitecms.com



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]