OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

dita message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [dita] Specialization of Attributes


This material is deep enough that I feel uneasy wading into it, but I thought I'd share the following intuitions. For those of you who have read Dorothy Parker's pleas for the return of Robert Benchley, you know how I feel.
 
Regarding formal foundations for DITA, there is reason to be optimistic, although it seems to me that a substantial amount of time and feedback is still required before the ideas gel into a form that is fully comprehensible.
 
To summarize: As a language, DITA seems to be doing something that is both original and logically sound. My inclination is to treat logical soundness as an extremely valuable criterion in language design, since it permits innovation without fear of getting stuck with language features that ultimately do not permit further extension, or that simply do not fit. (OK, enough intro. Time for the deep waders.)
 
Ignoring attributes seems to me like a recipe for incompatibility. If we have features for extensibility that can be accounted for within the framework, that permits us to relate dialects that otherwise would have only ad hoc relationships. "Oh, my dialect is just like yours, except ...". Treating all attributes as part of the framework permits us to say "Yes, you have values there that I don't process, but I can recognize that when I encounter them. I'm not ignoring your language construct, I'm just ignoring some of your data." This matters because attributes (especially conditional attributes) are supposed to be keys that trigger functionality, not just data.
 
Both an ad hoc approach to attributes and ad hoc structural modifications constitute logically risky forms of informality. Adopting new attributes or new structure using a structured mechanism such as the "is-like-a" relationships in object-oriented programming would pose a severe challenge to formalization, since both are like edits to the logic. Unless you provide notations that explain the edits in a way that is recoverable, you lose transparency. The idea of keeping records is what gives such compelling power to the generalization/specialization roundtripping principle. The ability to round-trip is a criterion that demands completeness of the formalism, and, as an operational criterion, it is one step easier (than completeness of the formalism) to comprehend and justify.
 
Looking at the inheritance hierarchy in object-oriented programming tells only part of the story, because the type system is limited to objects and methods. This is in part because the logic that was in current use when object-oriented programming was formalized could only handle ground-level data and functions.
 
We now have two additional logical mechanisms. The first step is to add relationships. Once objects can have relationships, and relationships can be typed, you can more explicitly model heterogeneous taxonomies.
 
The second step is to add higher-order constructs. Subject classification is in this direction. We connect from the basic DITA type system out to a subject classification type system, and then notice that we can still use the same type mechanisms for both. This permits instances to be self-describing, or to be described by external descriptors that are still within the language. (Aside on logical terminology: The conventional sense of "higher-order" is "adding collections of things to the logic, not just things". In this paragraph, I'm saying "higher-order" about these self-descriptive constructs, but I think that's ok. Since collections are defined through description, a willingness to talk about descriptions has a similar effect (the same effect?) formally as the willingess to talk about collections.)
 
Regarding attributes, sometimes they are descriptive and sometimes they contain content. If we have a single logic that comprehends both, then we are safe. From that point of view, it makes perfect sense to apply the same principles to attributes as to elements. For some attributes, our interpretation of the value is content-like, and in those cases, we use the ground-level justification, saying that the attribute is element-like. For other attributes, our interpretation of the value is description-like, and in those cases we use the description-based justification, saying that the attribute carries information that enables us to know better how to handle the topic, either because we know meta-information about it in the old metadata sense, or because we know how to relate it more directly to subject classification.
 
Now about attributes that carry structural information, like links. I think structural information tells you something about the structure of the instances, and doesn't challenge the type system.
 
Best wishes,
 
Bruce Esrig
 
=============
 
Reference on Benchley: http://www.compedit.com/benchley.htm
 
-----Original Message-----
From: Michael Priestley [mailto:mpriestl@ca.ibm.com]
Sent: Tuesday, April 18, 2006 4:13 PM
To: Dana Spradley
Cc: dita@lists.oasis-open.org; Paul Prescod
Subject: Re: [dita] Specialization of Attributes
1) I think it's perfectly reasonable to think of jobrole as a specialization of audience, and hardware as a specialization of platform. So I don't think specialization of attributes is some crazy weirdness. There are more and less semantically descriptive attributes, just like there are more and less semantically descriptive elements. It's the job of the DITA specialization hierarchy to relate the specialized model elements to their ancestors for the sake of shared processing.

2) I don't think it's a kludge to add attributes the way we are doing - at least, no more a kludge than domains in general, or specialization in general. We are not opening the door to random attributes on a per-element basis, but we are opening up the door to random universal attributes, per the second use case in issue #20. These do exactly what you want (ie are ignored by other processes), while still allowing processes to compare the content models of two different documents and identifying the differences (eg "topic A and topic B are both tasks, but topic A is about software and has conditional processing for hardware platforms, and topic B is about hardware and has conditional processing for power supply").

3) If you are not sharing content between groups with different doctypes, then this ability to compare and understand differences at the processing level is understandably not very compelling. But it is a fairly basic promise of DITA that sets it apart from other doctype architectures.

Re metaphors: I don't care about the metaphors. Specialization is restricted because it allows us interchange across doctypes. If we drop that benefit, we drop about half the business value, from my point of view.

I expect that we will want to support adding element-specific attributes in DITA 2.0 at the latest. I expect it will be even harder than what we're doing now (this is the low-hanging fruit after all). So I do have the same goal as you, and am willing to work towards it, but not at the cost of DITA's basic reuse promises, which we've been making for several years

With respect to the choice of paradigm:  I think you are clearly willing to accept some restrictions on how you extend the language, or you wouldn't be using XML at all. I think there's really a continuum of choice, weighing extensibility/customizability versus reuse of content/sharing of infrastructure.  You can get maximum reuse/sharing by agreeing on a common base language with no extensibility; you can get maximum flexibility by agreeing only on the use of XML, with no additional architectural layer.

DITA lies somewhere in between, with an unprecedented degree of sharing for a customized solution, or an unprecedented degree of flexibility for a standard solution. Are we really on completely different pages here?

Michael Priestley
IBM DITA Architect and Classification Schema PDT Lead
mpriestl@ca.ibm.com
http://dita.xml.org/blog/25



Dana Spradley <dana.spradley@oracle.com>

04/18/2006 02:33 PM

To
Paul Prescod <paul.prescod@blastradius.com>
cc
dita@lists.oasis-open.org
Subject
Re: [dita] Specialization of Attributes





I agree with Paul: "I'm not even sure whether it is meaning to talk about properties as having an "is-a" relationship to other properties."

In theory, DITA treats elements like programming languages treat objects. In this metaphor, it seems to me that attributes should be thought of like Java variables: not objects themselves, but just part of the object's interface.

Unfortunately, because the DITA object hierarchy is an "is-a" hierarchy, and doesn't support "is-like-a" relationships, you can't extend this interface by adding a new attribute/variable to the element/object.

The solution we are considering for adding attributes while preserving an exact identity between the element before, and the element after this attribute is added should be considered, then, not some other avenue of inheritance peculiar to attributes - but basically nothing more than a kludge, to make up for the fact that we don't allow arbitrary attributes to be added.

The question I have is: do we really need such a kludge - especially when it comes to general, non-conditional attributes?

Couldn't the DITA toolkit merely make it a design principle that any unrecognized attribute should just be ignored?

I also think the reason this issue provokes so much discussion - and has prompted Michael to run through so many hoops in the design - is that some people (myself included) came to DITA hoping to find, not an architecture that would allow round-tripping of documents defined according to slightly different DTDs - but rather an architecture that made customization easier to manage through some kind of "extension" mechanism - and into which today's customizations might become part of the base language in the future - the way, for example, regular expression libraries eventually found their way into the Java base.

It seems that DITA has already made the decision between these two paradigms - and that maybe it should change it's name to, say, IDITA, the "Intelligent Design Information Typing Architecture" - the base is so well designed that we don't need to extend it, just find our specialized place within it.

Maybe another group will come along with a truly extensible DTD/toolkit someday. If so, perhaps "evolution" would be a fitting name.

--Dana


Paul Prescod wrote:

Here's what I think that Erik wants:

Given:

programmertype specializes role
role specializes audience

A filter on "audience='javaprogrammer'" should match content marked
"programmertype='javaprogrammer'"

But consider the complete implications of this. If audience specializes
props then "props='javaprogrammer'" will also match. This implies that
ALL values live in a single namespace. direction="left" and
politics="left" would collapse down to the same thing. Nor is it just a
namespace issue. Remember that DITA treats two values in the same
attribute (OR) differently than it does two values in different
attributes (AND). With attribute specialization we have some kind of
in-between world where attributes are kind-of in the same attribute and
kind-of in different attributes.

Michael's proposal is that from a matching point of view it is totally
irrelevant that programmertype specializes role or that role specializes
audience. He says that the specialization information might be used
somehow but not by the standard matching algorithm. This seems too fuzzy
to me and also dangerous in that it encourages people to use the
undefined feature. When we come up for a meaning for it (perhaps based
upon Erik's ideas) then it will be too late to redefine the behaviour.
Better to outlaw it until we understand it.

This would also go for generic attributes.

I'm not even sure whether it is meaning to talk about properties as
having an "is-a" relationship to other properties. Mainstream
programming langauges certainly have no such concept. More often there
is some kind of a "derived-from" relationship but these are typically
quite complex. "Color" can be derived from "Red", "Green" and "Blue".
Age can be derived from "Date born". But now we're into the semantic
web, not DITA.

Paul Prescod
 



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]