After our discussion today, I'm starting to agree
that for 1.1 (and maybe beyond) this enhancement can remain restricted
to filtering attributes - so long as customizers can continue to use
props/otherprops as a grab-bag as well, writing their own serialization
code to run their documents through any uncustomized tools.|
The types of CDATA attributes I've been describing are very application
specific - often to a legacy application - and as such are perhaps best
dealt with in a DITA customization, not an extension.
Chris Wong wrote:
I sympathize with your desire not to turn
specialization into a map-anything-to-DITA exercise. But I wonder if
the attributes restriction is appropriate given the freedom DITA
already gives to specializing elements. Specifically, I'm not sure that
the argument for generalization is strong.
The way I see it, generalization support is
for supporting legacy processing. Existing transforms that only know
certain elements can process specialized content after generalization.
But with DITA being so new, my question is: is support for legacy
processing useful if there is no legacy processing to support? Any new
DITA-aware code today should be written to look at the class attribute,
not the element name. Generalization would do nothing for them. There
is the danger that this theoretically useful DITA feature may forever
remain theoretically useful. When would it ever become useful?
That said, we are where we are, and I would
be wary of dropping or changing a major DITA feature in a 1.1 release.
While people want a more general attribute addition scheme, people also
really, really want this extensible metadata attribute capability. If
adding the former will derail the 1.1 release or drastically delay it,
I would go for the latter just so we can quickly eliminate a major
obstacle to DITA adoption.
If we make it wide-open then it will
be very rough going figuring out how to unambiguously generalize and
respecialize. The harm then is that you run your content through some
process that requires generalization (let's say it's a trademarking
tool), then respecialize, and all your attribute values are hosed.
We should certainly be looking at
ways to make specialization and generalization more robust and open in
the 2.0 timeframe - and I think XSLT 2.0 capabilities will be a
necessary part of that solution. But for 1.1 I thought we were looking
just at a tactical solution for metadata attributes only.
Now it seems different people
understood different things by the term "metadata attributes". I'm
sympathetic, and I want to make everyone happy, but if this turns into
a general "add any attribute for any purpose" feature then it
completely excedes its mandate.
I think there may be a philosophical
difference here. Specialization is not supposed to support arbitrary
mapping to any possible DTD. For that kind of flexibility, you need
something like architectural forms. DITA is designed to deliver less
flexibility than architectural forms, thus more predictability and
interoperability at lower processing cost. I'm wary of slipping down a
slope that ends up allowing any possible design expression, with a
resulting explosion in the cost of managing the differences.
Right now I think the minimum
requirement is to reserve some special characters from use in
specializable attributes, in order to allow generalization
roundtripping. I think the safest way to accomplish this is to define
two new attributes as the base for specializing attributes: props for
profiling attributes, meta for other kinds of attributes. Anything
beyond that looks to me like a 2.0 thought.
the type of thing I'm thinking about is
typically some associated data we need to preserve to maintain
compatibility with a legacy application
but I think there are probably lots of other types of attributes I and
others might want to be able to conveniently add
what's the harm?
Michael Priestley wrote:
I'm not sure how attributes being less constrained means they should be
easier to add.
Can you give me examples of CDATA attributes you want to add?
Yes, but as I said, attributes are less constraining than elements, and
should be even easier to add.
And I can tell you that I for sure need to add CDATA attributes - and I
suppose if I do, many others do too - and will continue to push for a
customization hook in the dtd/schema if we can't find a way to parse
them into a generalizable form.
If we can find a way - then the addition of an authorized customization
hook is a dead issue.
Michael Priestley wrote:
The key statement below is "like you can with elements". In fact you
cannot add arbitrary elements in DITA, they must be based on existing
elements, which is exactly what we're proposing for attributes. So it's
not completely arbitrary, just arbitrary enough :-)
The one place we've seen a consistent requirement for new attributes,
by the way, is for new profiling attributes (what get called metadata
attributes in the spec). I'm betting that's what the author below was
Okay, I understand (theoretically) the ambition to make specialization
something more than just an easy way to customize.
But what's wrong with providing both? Why shouldn't DITA be easy to
customize, where customization is application specific and willing to
be ignored everywhere else?
And I think making DITA easier to customize with respect to attributes
is a big deal to the community of potential users - the following from
another list I'm on where someone asked people to summarize their take
* What is the worst thing about using DITA?
-- You have to break DITA (or add to it) to do anything useful with
attributes. The DITA committee is developing a solution to this right
The problem is that you can't add arbitrary attributes like you can with
Michael Priestley wrote:
It would be accomplished using an entity redefinition, same as with
I think the theoretical advantages are substantial. Under the current
model, specialization modules are plug and play: we can determine by
inspecting the class and domains attributes what modules are needed for
a document type, compare constraints/modules across document types,
automatically determine lowest common denomenators between different
document types, and generally make information exchange possible on an
automatic level across document type boundaries.
If we allow attributes to be part of a specific custom DTD, not part of
a specialization, then we los the ability to move that information up
the framework or identify which attributes have been added. Instead of
having a document type that follows consistent rules that can be
automatically compared and ultimately automatically assembled, we have
a custom DTD that must be built by hand, customized by hand, and
migrated by hand. In other words it no longer operates as part of a
framework, because something outside the framework has its hooks in it.
If you view specialization as just a way to make customization easier,
what you're saying makes sense. But specialization isa lot more than
that. The advantages of specialization, as described above, are based
on thought experiments rather than experience simply because DITA is in
its early days and there are only a few dozen specializations floating
around, at different levels of completeness and at different companies.
But if you follow the thought experiment forward, and think of what
happens when we have hundreds of specializations across industries and
authoring communities and want to manage the differences and
commonalities in a consistent and scalable way, specialization delivers
what customization cannot: a way to automatically inspect, compare, and
reconcile those differences without loss of information or loss of
That works for me too - though given the only theoretical utility of
roundtripping, I'd prefer a more easily parsed option as well.
Again, why not an empty parameter entity (dtd) and attribute group
(schema) that you could put anything in all all, and which would be
discarded on generalization?
Attributes should actually be easier to specialize than elements, not
harder: there's no content model to enforce. So why not throw them wide
open? Generic processing can simply ignore the additions.
Michael Priestley wrote:
I was thinking roughly the same thing, although perhaps with "meta" as
the generic ancestor, parallel with "props". If we are willing to
restrict the normal content of "meta" to be simple tokens (ie simply
don't allow parentheses except in the generalized form) then we could
use the exact same model for generalizing/roundtripping both
attributes. Effectively we'd have one generic ancestor attribute for
conditional processing attributes, and one for anything else. They
could also share the same XSLT library for unpacking the conditions if
processing is desired in the generalized form (any process that can't
handle the generalized form would be considered specialization-unaware).
I propose the following:
a) We make a new attribute called "otherattrs" (like otherprops but not
just for selection/filtering)
b) We make a new issue for specializing the "otherattrs" attribute
c) We to synchronize the generalization/specialization mechanism for
"otherattrs" and "props"
In thinking about this, it seems not too difficult at a first
approximation. The main two issues are:
1. Escaping paren characters that would otherwise be confused for
* This can be solved by having an escaping mechanism like
paren characters resolve to one, three resolve to two etc. A paren
character alone represents an end-of-attribute marker"
2. Keeping track of which attribute values have ALREADY been generalized
so that we don't end up escaping the value over and over again (or
unescaping it wrongly).
* This can be solved with an architectural attribute that
the attributes that are already generalized.
So, for example, I could specialize "otherattrs" with an attribute value
that represents the last-changed-date for an element.
Generalized, that might look like:
Still to think through:
a) does this handles multiple levels of specialization well?
b) is there a requirement to handle multiple levels of specialization?
c) what does the processing (e.g. XSLT or CSS) look like to handle
d) is there a more elegant solution than "generalizedprops"? Perhaps by
looking at the domains in scope after generalization?
For me, the answers to questions a-c are also not clear yet for
Michael's current proposal.