dita-adoption message

Subject: Re: DITA 1.2 Language Specification Comments
From: David J. B. Hollis <dhollis@AandOConsultancy.ltd.uk>
To: Robert D Anderson <robander@us.ibm.com>
Date: Sun, 8 Feb 2009 14:26:30 +0000
Hi, Rob

> Thanks for the additional comments.

My pleasure! Thank you for such a detailed response.

JoAnn has asked that I keep the Adoption TC posted.


>>> OASIS has an official reporting mechanism that we don't get a lot of
>>> control over; it will be listed on the cover page of the final
>>> standard...
>>
>> How effective is this reporting mechanism? Have you had any comments
>> from it?
>>
>
> Comments on that list are not frequent, but they do come in.

Do you feel this is sufficient, or would this be something for the  
Adoption TC to look into further?


>>> About the inheritance section...
>>
>> I'm coming at it more from a tech. author's perspective. For me, the
>> information seemed almost to clash with the preceding Contains &
>> Contained By sections.
>
> JoAnn suggested today that we should have a topic about reading the
> architectural spec - I think that it should explain what the  
> inheritance
> section is doing. Alternatively, or in addition, we could change that
> section title to "Specialization ancestry" or something similar?  
> I'll ask
> the TC about this one.

I like JoAnn's idea of an explanatory topic. I think I'd leave the  
section title as 'Inheritance', as this is the 'Darwin' of DITA. Will  
it be subject to otherprops="inheritance", as detailed below?


>> There almost needs to be an 'Introduction to the DITA Language Spec.
>> for Technical Authors'. It's not feasible to include explanations
>> within each element.
>
> Agreed - JoAnn mentioned writing something like that which would also
> explain how to read the attribute tables.

That'd be very useful.


>> An (ideal!) alternative would be to use @ audience round certain
>> sections, with 'architect' and 'author' values, say. The confusing
>> architectural stuff could then be kept away from authors, and perhaps
>> some explanations could be provided for authors, and kept away from
>> architects.
>
> I did have a thought along those lines when I set up all of the  
> topics for
> 1.2 - they now have otherprops="inheritance" on that section for all  
> of the
> elements. So, if you want to reformat without including those  
> sections, you
> can exclude otherprops="inheritance". The same is true for the  
> contains and
> contained-by sections, which have their own properties.

That's great! See above.


>> If, as a technical author, I'm looking at the Map area of the  
>> Language
>> Spec., I would ideally like to know that Maps can contain a title,  
>> and
>> to have this presented to me in some way within the Map area of the
>> Language Spec. Is this not possible via one of the linking  
>> mechanisms:
>> links, conrefs, reltables?
>>
>> As a technical author, I don't want to have to remember that Maps can
>> contain titles, but titles inherit from topics, so I have to look at
>> the topic area of the Language Spec.
>
> I'm really not sure how to handle that, given that so many elements  
> are
> present in both maps and topics. If you read the spec "by type", the  
> Topic
> section should only contain elements that are available exclusively to
> topics. The body elements, meta elements, and others may appear in  
> both.
>
> I don't think that we can include links from the <map> topic to all  
> of the
> elements it can contain ... aside from in the contains/contained-by
> section. I'd be happy to go to the TC to ask for other suggestions  
> on how
> to list this sort of thing, but I'm having a bit of trouble  
> formulating the
> question in a general way. Something like - "When looking in one  
> section of
> the spec (such as <map>), how can I more easily identify the  
> elements that
> are available but are listed elsewhere in the spec (such as <title>)?"

I do feel strongly that authors need to understand what is available  
under maps.

This links with other debates in this email.

For the time being, perhaps, a 'Typical Map Elements' topic? (I'm sure  
there's a better title, though!) This would list all the elements  
which have been reused from topics. Trying to create a 'sensible' list  
of elements which you might reasonably expect to see in a map, and  
leaving out elements which would lead to embedded content, e.g.  
tables. I'm sure this would spark some interesting debate, but it  
would eventually lead to defining those elements which probably need  
to be defined for maps, rather than reused.


>>> I'm working on getting some flags to highlight elements that were
>>> new in
>>> 1.1 or 1.2.
>>
>> That's great! Will these include deprecated elements/attributes?
>>
>> 1.1 flag - added in 1.1
>> 1.1d flag - deprecated in 1.1
>> 1.2 flag - added in 1.2
>> 1.2d flag - deprecated in 1.2
>
> I hadn't set it up for deprecated flags - it should be possible to  
> do that
> though, given that we have so few deprecated items at this point. If  
> I get
> time, I will do that.

It would make it easier to track changes to the spec. Yes, please!


>> Does lcDescription exist? It seems to be included in some Contained  
>> By
>> listings. I'm looking at langspectocjs20090108/alpha and it's
>> noticeable that lcDescription doesn't have a link whereas everything
>> else seems to be linked.
>>
>> lcDescription has an entry in Complete Content Model Definitions, but
>> it says: Contained by: This element is not contained by any other
>> elements.
>>
>> Really?!
>
> It does not exist. It was originally part of the DTDs, but was  
> removed from
> any content model; however, the definition was left in. I actually  
> noticed
> this in the contains/contained-by table, and asked the learning  
> group about
> it; they said the element should not be part of the DTD; I updated  
> the DTD
> and posted it, but forgot to re-generate the "contains/contained-by"  
> topic
> from the new version. It will be fixed in the next posting.

That's great!


> For CDATA and NMTOKEN:
>>
>> I appreciate tool vendors, developers and architects will know the  
>> XML
>> standards, and recognise these as such. But, quite frankly, CDATA
>> means nothing to a technical author!
>
> I asked the TC about this yesterday. JoAnn Hackos suggested defining  
> CDATA
> and the other values in the new "Reading the spec" topic.

That'd be very useful.


>>>> Default value - #implied Where is #implied defined?
>>>
>>> That one is also an XML term. I can't say I understand why that is  
>>> the
>>> term, but in DTD terms, it means that the value is not defaulted.
>>> I'm not
>>> sure how to clarify that without leaving behind the standard XML
>>> terminology.
>>
>> I don't understand what is meant by, "The Default Value is 'not
>> defaulted'." I'm simply replacing #implied with 'not defaulted'.
>>
>> Surely, either there is or there isn't a Default Value? If there is,
>> why can't the value be specified?
>>
>> W3Schools states this:
>>
>> Use the #IMPLIED keyword if you don't want to force the author to
>> include an attribute, and you don't have an option for a default  
>> value.
>>
>> The issue I'd have is that some times there ARE obvious contenders  
>> for
>> default values. Or, at least, it seems so to me ...
>>
>
> The distinction here is between defaults enforced by the standard  
> (those
> specified in a DTD or Schema) and those enforced by a tool.
>
> The specification is listing only those defaults which are actually  
> defined
> as part of the standard - that is, defaults that a processing tool  
> cannot
> chose on its own, because when they see a document they pick up the  
> default
> value. One example that is core to DITA is the class attribute -  
> these are
> defaulted in the DTD or Schema, so any time a parser reads a DITA  
> topic, it
> picks up the class attribute from the DTD or Schema. A parser cannot  
> tell
> if an attribute was actually specified in a document, vs. defaulted  
> in the
> DTD or Schema. Another example is placement="inline", which is the  
> default
> for images - to get standalone display, an author *must* specify the
> attribute placement="standalone" on the image.
>
> As you say, there are frequently obvious contenders for a default  
> value -
> but those are typically defaults enforced by a tool. For example, the
> compact attribute on a list has no default in the DTD or Schema.  
> Tools are
> free to set the default, or to modify it to fit local style rules. For
> example, IBM's style generally forces lists to "not compact" when  
> there are
> block elements in the list, but leaves it as compact when the list  
> only
> contains phrases.

I'm sorry, Rob., I fundamentally disagree with this! IMO, any time  
there's a list of attribute values, there ought to be a default value.  
For me, this is simply inherent in defining any interface, and  
removing the remotest hint of ambiguity.

If you allow the processors to define the default, then that is the  
first step on the slippery road to vendor lock-in, and authors simply  
won't know where they'll stand - they'll be in No-Mans' Land!

There is only one structure, but a multiplicity of processors - open  
source, in-house, proprietary. Potentially, different processors per  
format. That is why the structure MUST be pre-eminent, MUST be  
protected at all costs, and MUST NOT kow-tow to any processor or vendor.

Using the Note example from below, what is the author wanting to  
define? Not necessarily the specific rendition, but the message in the  
content. It's a 'Note', it's not that 'Important', and it's certainly  
not a 'Warning'. It just needs a bit of highlighting. When choosing  
'Note', an author is not saying they expect to see a Post-It Note  
icon, or a pen, nor that Warnings must be in triangles, even though  
these may be obvious renditions.

As to rendition, sure the author is interested, but it has to be  
secondary. The content could be about Adam & Eve in the Garden of  
Eden. Interpreting 'Note' as a fig leaf would be highly appropriate,  
and the precise job of the processor. Important could be two fig  
leafs, and Warning three red fig leafs! (OK, I'm getting carried away,  
now ... !) Hopefully, tools would allow easy access to these icons for  
in-house requirements.

Without sensible defaults, the author's task becomes a nightmare!  
They'll be forced to define every single attribute. They must be able  
to rely on the structure.

In a reuse and single sourcing scenario, content HAS to be written to  
the structure, NOT the format/processor. Otherwise, it all just falls  
apart. They'd be no scope for experimenting with tools, no scope for  
collaboration or localisation, and so on.

My understanding is that DITA 1.2 will allow some architectural  
tweaking. I presume this would allow any default value to be changed?  
Which actually increases the argument for setting proper defaults, so  
the architects have a solid base to work with.

(I apologise for the diatribe ...)



> The "Reading the spec" topic should make clear what the default  
> column is
> intended to do - that is, it lists the standard-enforced default,  
> rather
> than the generally anticipated default. As a side note, I've asked  
> the TC
> for advice on removing #IMPLIED ... I do agree that it's confusing  
> and I'd
> like to get rid of it.

That would be useful. I'm glad there are structure defined defaults!


> About spectitle:
>> Is it being deprecated in 1.2, then?
>
> I don't think the TC has made a ruling on it, so I haven't listed it  
> as
> deprecated. As you can probably tell, I would prefer that it be  
> deprecated.
> I'll check with the TC about it.

Great!


>> Contains seems to often, but not always, include 'text data'. Is this
>> another architectural/XML thing?
>
> Actually the XML thing in this case is #PCDATA, which means, user  
> entered
> text (as opposed to elements). We put in "text data" because it  
> seemed to
> make more general sense. Any other suggestions?

Hmm, 'content', or 'text content' spring to mind. Any good? 'Data' is  
too reminiscent of programming, IMO.


>> So, you seem to be saying that the DTD allows certain things which
>> would be stupid to implement?!
>>
>> I suppose that's one of the penalties of inheritance?
>>
>> Are processors obliged to handle this?
>
> Well, yes. It's really unrelated to inheritance, more related to  
> sharing
> elements between different document types. That is - we want to define
> something once and use it twice. But, to do that, we end up with  
> situations
> where the element allows things in one location that don't make as  
> much
> sense in the other. Most of these come in through the re-use of the  
> <desc>
> element, in which case the many elements that come in are  
> technically just
> as valid - even though it does rather blow the mind when you step  
> back and
> say "I can allow tables in my map???"
>
> I would say that processors are obliged to handle it in some manner,
> because it is valid. That is, you can't crash just because somebody  
> puts a
> table in their map. As with any element, how it is processed is  
> generally
> up to the processor.

Yikes!

You're right, it's not inheritance, per se. Is this really such a good  
idea? Is it really necessary to reuse elements between topics and  
maps? It seems to open up a hornet's nest, and lead to Frankenstein's  
Monster! (I love mixing metaphors! ;-)  )

I think I'd rather see title used by topics, and anything which  
genuinely inherits from topic, and then define maptitle so that that  
element can be properly defined within the context of a map, and so on.

How many elements are we realistically talking about, do you think?

Presumably maps foster their own inheritance, to book-maps, etc.? This  
could potentially exacerbate the problem.


>>> So far the namespace has remained the same between versions - I
>>> wouldn't be
>>> able to update that without input from the TC (some tools may depend
>>> on
>>> it). However, I've updated the spec so that it lists the version
>>> value as:
>>> "1.2" (version dependent; will increase)
>>
>> I suppose it's also OASIS dependent?
>
> The namespace is up to OASIS, and the version is related to the  
> release of
> DITA (with 1.2 coming up, we up the version setting).

Do OASIS allow version numbers to be jumped? I'm thinking of the  
thorny issue of the toolkit version number being out of step with the  
Language Spec. version number.


>>> I've added a reltable based link between the related links topic and
>>> the
>>> linking section - hopefully that helps.
>>
>> I don't seem to be able to see that in langspectocjs20090108/alpha,
>> but sounds like it would help.
>
> Sorry - I meant to say that I was adding. It will be in the next  
> posted
> version.

OK.


>>>> note
>>>>
>>>> Surely there should be a default value for @ type? 'note' seems
>>>> obvious to me!
>>>>
>>>> I like the fact that the result from processing the example is  
>>>> shown.
>>>> I'd like to see more of this, if at all possible! ;-)
>>>
>>> While most processors probably treat 'note' as the default, the
>>> document
>>> type does not enforce a default, and at this point I think it would
>>> not
>>> want to change that in case some application treats it differently.
>>
>> Hmmmm.... Just seems like an obvious gap to plug.
>
> I think this will be covered by the "Reading the spec" topic,  
> explaining
> that the spec can only list enforced defaults in the default column. I
> think <note> would make a good example for that topic - "There are  
> other
> generally recognized defaults, such as type="note" on the note  
> element,
> where..."

See above, #IMPLIED & defaults.


>> ... I was
>> looking at the Complete Content Model Definitions for base, and
>> thought a load of links had been left out - then realised why! It
>> looks odd to have links to term and ph, but not apiname, IMO.
>
> Yes, it does look odd - but when we don't have the apiname topic  
> available
> in that package, we can't really link to it. The 'official' version  
> of the
> standard will be the full "by type" version, which has all topics  
> included
> and all of those as links.

OK, I appreciate that. May I suggest some changes to the <keyword> text?

Current:
<snip>
Specific markup recommendations:
	• Use <apiname> for API names and <cmdname> for command names.
	• <term>should be used for inline paragraph definitions; to indicate  
what you're defining.
	• <ph> should be used for general phrases; when you think that  
keyword is not appropriate.
	• Inside syntax diagrams and syntax phrases, use <kwd> to indicate a  
programming keyword.
<snip/>

Proposed:
<snip>
Specific markup recommendations:
	• <term, with link>should be used for inline paragraph definitions;  
to indicate what you're defining.
	• <ph, with link> should be used for general phrases; when you think  
that keyword is not appropriate.
	• If using Programming Elements, use <apiname, no link> for API names  
and when inside syntax diagrams & syntax phrases, use <kwd, no link>  
to indicate a programming keyword.
	• If using Software Elements, use <cmdname, no link> for command names.
<snip/>

Two bullets for the Programming Elements might be preferable?


>>> Hmm. I've updated the title of "Prolog elements" to "Prolog  
>>> (metadata)
>>> elements" ... think that will help any? I've also added a "metadata"
>>> index
>>> term to that topic, for situations where we generate an index.
>>
>> I can't see that in langspectocjs20090108/alpha, but it sounds better
>> to me.
>
> It will be in the next upload.

It's there in the type version, and looks fine. Thank you.


> Re: .xml versus .dita:
>> I just remember trying to pick up a 4GL database application, once.
>> The examples had been created by different folk, with different
>> methodologies, and at different times. I think I spent more time
>> trying to understand why one example did things one way, whilst other
>> examples did seemingly the same things another way, than actually
>> learning how it worked.
>>
>> At the end of the day, it simply adds unnecessary confusion for
>> someone trying to learn DITA.
>>
>> Just be brave and consistent, and use filename.dita throughout! Then
>> blame the Adoption TC when the flak comes! ;-))
>
> Oh, sure, you've won me over. I'll update it - in the next upload  
> they will
> all use .dita.

Great! Thanks! We had a useful debate on the Adoption TC.


>>>> sl
>>>>
>>>> Surely the default value for compact is 'no', not #implied?
>>>
>>> Covered above, I think - no default, which in DTD terms means
>>> #implied.
>>> Same for <dl>.
>>
>> Surely it can only be one of 3 things: yes, no, or dita-use-conref-
>> target
>>
>> So, which is it?!
>>
>> If it's not required, and doesn't have a default value, then the
>> processor/toolkit has to decide. This could lead to a lack of
>> consistency between one transform and another.
>
> Yes, that's true. In this case though, I'm with the tool folks that  
> say the
> spec should not list a default. As described above - there are valid
> reasons that a company style or an output format might choose to go  
> one way
> or the other. If the spec says there is a default, then any output
> processor that does the opposite is no longer DITA-compliant. I do  
> think
> that the specification needs to be very strict about making sure that
> content is consistent - that is, if <xref> is supposed to pull  
> content from
> the target, then all applications should do it. However, how to  
> render an
> xref (or a list, or a figure) is always left up to the rendering tool.
>
> Anyway - my two cents on this one. As with the others, I think this  
> should
> be explained in the "Reading the spec" topic, and removing the  
> #IMPLIED
> listing for a default value will also help.

No, I'm sorry. IMO, structure has to be King!

'Compact spacing' and 'expanded spacing' are relative terms which can  
still be open to interpretation by the processor and/or format.

The important thing for an author is to know where they stand when  
they're writing content. 'Is this list going to be tight or loose,  
what's the default?'

1.2 should allow an architect to override the default for the purposes  
of an in-house preferred style. They can also tweak stylesheets to  
define the relative amounts of 'tightness' or 'looseness' as  
appropriate.

The processor must do as its told, and its up to the vendor to  
interpret the relativism of 'compact' vs. 'expanded'.

There is, however, a problem here, and it's related to mixing dita-use- 
conref-target in with a boolean. I can understand a processor/vendor  
preferring to allow the conref target to define the type of list. But  
if there's no conref target, then it must be either yes or no - and  
there really ought to be a default. But you can't have 2 default  
values, nor an if statement for a default!

The better way out of this, and any similar attribute paradoxes, is to  
introduce a 'conref-target-override' boolean attribute, and removing  
dita-use-conref-target from the compact attribute options list. This  
would mean that compact is simply a boolean, and can get a sensible  
default value. But it will be overridden by default with conref-target- 
override.

My 2 penn'orth! I really, really, don't want to be at the mercy of any  
tool/procesor/vendor!


Re. <fig> using %display-atts, and incorporating an <image>, which has  
@scale, etc.

>> There's surely a potential clash? E.G., scale could be defined in 2
>> places. Would one scale multiply with the other? What if scalefit is
>> used in the image, and the fig uses a scale from %display-atts?
>>
>> I'd have thought it might be worth a discussion, but I'm ready to
>> concede that my argument is somewhat theoretical. Maybe the two don't
>> really clash, in practice?
>
> If you'd like, I can go to the TC to ask about how scale interacts...
> otherwise I will leave it up to implementations.

I think I'd like to see a discussion, please. I just don't like the  
potential for ambiguity.

We could discuss it on the Adoption TC, first, then Gershon could  
bring it to the TC, if you'd like?


>>>> cite
>>>>
>>>> What is the expectation, if any, from the processing?
>>>
>>> This is one where I'd prefer to stay away from formatting
>>> descriptions.
>>> While the pre and lines elements have formatting (line breaks) as
>>> part of
>>> their core purpose, this one is meant to define a citation. There  
>>> are
>>> enough style guidelines out there with different ideas about how
>>> citations
>>> should be formatted that I would rather keep the DITA spec out of  
>>> the
>>> debate. FWIW, I just checked how the HTML specification defines the
>>> cite
>>> element, and it says "The presentation of phrase elements depends on
>>> the
>>> user agent."
>>
>> Sounds like a cop-out! ;-)
>>
>> OK, I do understand the difficulties, but as a tech. author I don't
>> think it's unreasonable to know how something 'is likely' to be
>> rendered. Bold? Italic? Or is it more for semantic purposes, and not
>> to be rendered any differently?
>
> It is a bit of a cop-out, but it's one that standards like this are
> generally forced to go with. For citations in particular - there are  
> many
> conflicting style guides about how citations should be presented,  
> whether
> that is with quotes, in italics, or in plain text. I know that IBM  
> for one
> has pretty complex rules governing the presentation of citations. As  
> to
> whether this is for semantic purposes - with only a few exceptions,  
> all of
> the DITA elements are there for semantic reasons rather than for  
> rendering
> purposes. The highlighting domain elements are of course an  
> exception, as
> are some of the core elements like <bodydiv> that are present as  
> building
> blocks rather than for any semantic purpose.
>
> I will add this comment to the <cite> topic: "Though citations will  
> often
> be set apart from surrounding text, such as through italics,  
> rendering of
> the <cite> element is left up to implementations."

OK, I see the complications and difficulties.


>> It's certainly an interesting one! Should q correspond to single or
>> double quotes, for instance? Do all languages recognise their own
>> versions of single and double quotes?
>
> This one is an open topic for discussion at the TC right now. HTML 4  
> and
> XHTML 1 both said that quotes should be rendered automatically with  
> <q>.
> The working drafts for HTML 5 and XHTML 2 both reverse course and say
> authors must add them. I'm trying to track down their reasoning.

What a minefield! What's the point of q/lq if authors have to add  
quotes, which then need localising?!

> The spec will have to address this, even if only to lay out the  
> concerns
> for both tools and authors. In this case the rendering will explicitly
> affect what authors enter - if tools are supposed to add quotes, they
> should not, and vice versa. So authors need to know what the  
> expectation or
> concerns are.

I'm glad to hear it's being discussed, and that it will be clearly  
defined.


>> If at all possible, I would really like to see a version of the
>> Language Spec. tailored to an authoring audience.
>
> It is certainly possible, in fact trivial, to reformat the guide  
> without
> the inheritance section. It's possible, but less trivial, to filter  
> out the
> attribute columns so that it contains only the name and description  
> of each
> attribute (and drops the data type / default / required columns  
> entirely) -
> I know some groups have done that for their authors.
>
> When starting work on DITA 1.3, I think we should think from the  
> start if
> there is any way to set up the guide to make this multi-audience  
> aspect
> easier.

That'd certainly be really helpful.


>> I don't know whether PIs would be appropriate for inclusion in a
>> Language Spec., but that seems a way out of the 'toolkit as main
>> processor' debacle.
>
> They are not technically part of the language, so they cannot really  
> be
> included at this point. PI's are also a bit of a religious debate in  
> the
> XML community - some love them, some hate them with a deep and abiding
> passion. There were long and fierce debates over whether they should  
> even
> be in XML in the first place. I'm in the middle - I prefer to avoid  
> them,
> and I've seen them used in terrible ways, but I feel they have do  
> have a
> valid role. It seems to be rather lonely in the middle though.

I'm sorry - that's my fault, not understanding what PIs are. I saw a  
discussion on the Users' Group, and put two & two together. Fatal  
mistake! ;-)

I was interpreting PIs as being suggestions for how an element is  
rendered. Not promoting the toolkit over other processors. Not telling  
vendors exactly what to do. Providing some kind of guidance, or best  
practice.

The Adoption TC has been asked to look at tool conformance issues. The  
first step on this road is probably to have some DITA conforming  
content which incorporates all elements and will fully exercise a tool.

However, this then comes back to the map elements debate. Should such  
content include tables and embedded content inside a map, because DITA  
allows it?! Strictly speaking, it probably should. But that would fly  
in the face of 'Best Practice', which is another hot topic for the  
Adoption TC! Ideally, the answer would be to sort out maps!


> Thanks again for the review -

My pleasure, indeed! Thank you for taking the trouble with such  
detailed replies.

David J. B. Hollis