OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

# dita message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: RE: [dita] clarifying the href attribute in the language reference

• From: "Grosso, Paul" <pgrosso@ptc.com>
• To: "DITA TC list" <dita@lists.oasis-open.org>
• Date: Thu, 17 May 2007 12:23:16 -0400



> -----Original Message-----
> From: Eliot Kimber [mailto:ekimber@innodata-isogen.com]
> Sent: Thursday, 2007 May 17 7:00
> To: DITA TC list
> Subject: Re: [dita] clarifying the href attribute in the
> language reference
>
> Grosso, Paul wrote:
> >
> > Yes, and we should note that implies certain character
> > restrictions, and we need to say something about what
> > should happen when the value isn't a valid URI (even
> > if we leave that implementation dependent).
>
> I'm not sure I understand why we have to say anything about invalid
> URIs--they're invalid and processing should fail.

Fine, that's saying something.  That's all I'm asking.
(We don't have a global statement in DITA analogous to
XML's not well-formed statement, so it's not clear in
general what should happen when a DITA document doesn't
conform to the constraints in the DITA standard.)

The issue is, what should happen when you see something like

<topicref href="my file#topicid"/>

or

<conref href="file#x1/#id"/>

I suspect some implementation will take the first case above
and work, even though the space should have been percent-escaped.
According to what you are saying, it should fail.  Fine, but
users arguing about which implementation is "broken".

In the second case, some implementation will decide that the
last # starts the fragment identifier and will consider the name
of the file to be "file#x1" (which is a fine file name on most file
systems) and the topic id to be "id".  Another will decide the
first # starts the fragid and will consider the name of the file
to be "file" and the topic id to be "x1" and the element to be
conrefed to have an id of "id".  Still another will give an error.

implementors and users know what to expect.  (See below for
my suggestion.)

>
> I suppose the other alternative would be to say that
> processors should
> do whatever they need to make the value be valid (e.g., escape any
> disallowed characters). This is admittedly a fuzzy area in XML.

We are working on making this unfuzzy for XML--see the HRRI
document I referenced in my earlier email.

>
> I think that Paul's concern is that within a tool like
> Arbortext Editor, where you will be creating links to
> local files, is the tool obligated to escape characters
> that are valid to the file system but not valid URIs?

Yes.  In fact, we ran into an issue where the CMS was naming
files based on the topic title, so the file names could have
all sorts of characters in them.

> My take would be that they should be escaped--I think systems that
> pretend that non-URI syntax is OK (e.g., using something like
> "c:\foo\bar.xml" where a URI is required) are doing everyone a
> disservice. But maybe that's just me? It doesn't help that most XML
> processors will silently accept "c:\foo\bar.xml" as a URI--that's a
> convenience but not actually in line with the requirements of any
> specification (including XML) that says a value is a URI or URL.

If our product complains about an invalid URI and another product
does not, we are the ones that will be blamed by users unless the
DITA spec makes clear what should happen in this case.

>
> >> * A URI with a hash must have a valid DITA local identifier
> >
> > s/A URI with a hash/An href value containing a hash/
>
> I think it should be "A URI with a fragment identifier"--fragment
> identifier is the abstract URI component and that's what we
> really care
> about. The spec should not be and does not need to be a
> tutorial on how
> to construct URIs. It needs to be as precise as it can be.
>
> Saying "an href value containing a hash" is ambiguous since
> an escaped
> hash character is both valid and, in the context of URI
> resolution, also
> counts as "containing a hash" that, because it is escaped, is not the
> fragment identifier separator.

I think you are missing my point (but upon rereading the
message history, perhaps my point is too subtle).  I'm saying
we shouldn't be trying to assign semantics to URIs at all
(with or without fragment identifiers--that's the job of
RFC 3986), we should be assigning semantics to (values of)
attributes in our (the DITA) vocabulary.

* The value of a DITA href is a URI.

* A URI without a hash resolves to the top element in the file except
where the top element is a <dita> element and the reference must resolve
to a single topic, in which case the URI resolves to the first contained
topic.

* A URI with a hash must have a valid DITA local identifier as the
portion after the hash. A DITA local identifier consists of
topicID/elementID for a subelement of a topic and of elementID for
topics, maps, and map subelements.

I would suggest something like:

* The value of a DITA href must be a valid URI reference [RFC 3986].  It
is an error if the value is not a valid URI reference.  An
implementation may (but need not) give an error message, and may (but
need not) recover from this error condition by attempting to convert the
value to a valid URI reference (perhaps by treating it as an IRI or HRRI
[appropriate references]).

* An href value consisting of a URI with no fragment identifier resolves
to the top element in the file except where the top element is a <dita>
element and the reference must resolve to a single topic, in which case
the href value resolves to the first contained topic.

* An href value consisting of a URI with a fragment identifier must have
a valid DITA local identifier as the portion after the hash. A DITA
local identifier consists of topicID/elementID for a subelement of a
topic and of elementID for topics, maps, and map subelements.

> I think we should avoid IRIs and HRRIs for now.

I think I lean in this direction too (for now).

So we need to make clear (which the description of the href
attribute in the language spec does not) that the value of
the href attribute must be a valid URI reference, and we then
need to say what should/can happen if this constraint is not met.

I believe my suggested wording above is one proposal that does that.

paul

p.s.  Actually, as long as DITA resources use the XML MIME type,
the fragment identifier must be an XPointer which the DITA local
identifiers are not.  So technically, an href attribute value will
only be a valid URI reference if we say that DITA resources have
their own XML-based MIME type and we define the fragment identifier
syntax for the DITA MIME type to be what we want for a DITA local