dita message

Subject: Re: [dita] History Question: Why does <data> not include <cite>?
From: ekimber <ekimber@reallysi.com>
To: Erik Hennum <ehennum@us.ibm.com>,"Bruce Nevin (bnevin)" <bnevin@cisco.com>
Date: Tue, 08 Sep 2009 14:30:58 -0500
On 9/4/09 3:41 PM, "Erik Hennum" <ehennum@us.ibm.com> wrote:

> The distinction would be academic without the fallback to general
> processing.  If the disclaimer specializes a footnote on the body, it will
> get formatted by default.  By contrast, if it specializes <data> in the
> prolog, it will get ignored silently by default (because there's no known
> processing for a property with no semantics).
> 
> Anyway, that's my understanding of the TC's original consensus around the
> <data> element, which of course may have evolved (toward the fittest).

I think that what I have presented is a problem that has several aspects
that happen to impinge on both how the DITA architects have to date modeled
or thought about metadata in general and what it means for something to be
metadata.

Erik's comment about the disclaimer being a footnote on the body is really
to the point: the disclaimer is, conceptually, a footnote on the *topic* as
a whole, rather than being a footnote on the body or any particular part of
the body.

I have seen this requirement expressed in other contexts by using footnotes
in the *title*, where the title may reflect some cited source or it is
simply the only available representation of the topic as a whole in the
original source format (where there is no useful notion of metadata as we
are talking about it here).

That is, what I have in this case is really footnote-type content, that is,
an explanation of a related source. As a footnote, it really needs to allow
any content. [And maybe the real solution is to allow <fn> within
<metadata>, just as we allow <index-item>.]

Putting a footnote in the title would not be a good solution, because it's
not expressing the true relationship between the topic and the disclaimer,
and putting it in the body would not be a good solution, for the same
reason. And in my specific case, the default presentation in both of those
cases is not the desired result.

In fact, there is *no* default presentation provided by the current DITA
spec or existing processors that can give me what I want because the
presentation rules for disclaimer in this case are that it follow all
content of the topic (including nested topics), not just the body.

But as it happens, I have to implement my own processing anyway because I'm
generating InCopy articles from the DITA XML and they have to reflect a
specific set of editorial and organizational rules. I already have custom
processing to synthesize both an "Author Bio" and "byline" from the author
metadata in the article, so doing something similar for disclaimer (and by
extension, HTML) is not a big deal in this case.

Another aspect is the details of how the disclaimer is captured as metadata:
Erik says, correctly, that metadata *tends* to be discrete data elements,
and in this case I *could* capture the variant parts of the disclaimer as
discrete values. However, I didn't in this case because there is no other
business justification for doing so for the disclaimer (as compared to the
author information, where it's clearly worth capturing more atomically) and
requiring it would add unnecessary complication and another point of failure
to a system that already depends on correct use of hard-to-validate word
processing styles.

Or said another way, if I had stepped up to designing the metadata markup to
capture the salient bits of the disclaimer I would not have had the issue I
ran into (no <cite> as a child of <data>). But I am contending that the DITA
standard, at its most general, shouldn't *require* me to go to that level of
effort as a cost of entry.

That is, at its most general, it is inappropriate for the DITA spec to make
a policy decision about what is and isn't appropriate for identification as
metadata, even when an unconstrained use of that design might leave room for
authors to color outside the lines.

I think some of this tension is a side effect of DITA not having the 1.2
constraint mechanism, which largely allows us to eat our cake and have it,
by allowing the most general design to be completely unconstrained but
making it both possible and easy for specializations to apply constraints as
they see fit.

At least in the context of what I'm doing here, which is an applications of
the topic types I've designed for the DITA For Publishers project, I would
certainly not object to the concept, task, and reference topic types
continuing to impose the current constraints on <data> and even going
further to document that certain element types should not be used as
descendants of <data> even though they are allowed (and those rules could be
formalized as Schematron rules or XSD 1.1 assertions).

But I would want <data> within unconstrained <topic> to at least allow all
phrases, if not all body content (although I could certainly buy the
argument that if you're at the point of putting complex body elements in a
<data> context, you should really have a separate topic related by
<data-about> or a relationship table).

Also, if you see metadata as mostly about *retrieval*, rather than about
processing, it's hard to see any harm in having more markup, rather than
less, in metadata values, since they serve to optimize searching.

Cheers,

E.  

----
Eliot Kimber | Senior Solutions Architect | Really Strategies, Inc.
email:  ekimber@reallysi.com <mailto:ekimber@reallysi.com>
office: 610.631.6770 | cell: 512.554.9368
2570 Boulevard of the Generals | Suite 213 | Audubon, PA 19403
www.reallysi.com <http://www.reallysi.com>  | http://blog.reallysi.com
<http://blog.reallysi.com> | www.rsuitecms.com <http://www.rsuitecms.com>
References:
- RE: [dita] History Question: Why does <data> not include <cite>?
  - From: Erik Hennum <ehennum@us.ibm.com>