RE: [dita] mime type for DITA?

Included below is an e-mail message from July 2006 (I guess the discussion I mentioned was a good bit more than the year ago that I remembered) that summarizes my thinking about DITA mime types at that time. This e-mail was internal to PTC/Arbortext. And as I mentioned before we never acted on this.

-Jeff

From: Ogden, Jeff
Sent: Tuesday, July 18, 2006 11:47 AM
Subject: RE: Browse item MIME type meeting minutes

I spent some time reading RFC 2046, Multipurpose Internet Mail Extensions (MIME) – Part 2, and RFC 3023, XML Media Types, last night. I also looked at the MIME content types that are registered with IANA and didn’t see anything related to DITA.

I’ve pretty much convinced myself that we don’t want to use the MIME Content types that are defined in RFC 3023 (text/xml or application/xml). RFC 3023 almost says as much and makes a suggestion for registering new xml related content types using the suffix “+xml”. And that seems like it might be the thing to do for DITA documents. It might look something like this:

Content-type: application/dita+xml; charset={charset-value} format={dita | ditamap | ditabase} type={topic-type-value} navtitle=”navtitle text”

All of the keyword parameters are optional.

charset would be the same as defined in RFC 3023 and basically either matches or overrides the charset information in the XML header.

format, type, and navtitle are similar to the DITA attributes of the same name. In general the values allowed for use with the MIME keywords are more restrictive than the DITA attributes and only the values shown are allowed.

format=dita is assumed if format isn’t specified. ditabase isn’t one of the standard items for the DITA format attribute, but the DITA format attribute does allow other unspecified values.

type only has meaning when format=dita. type values are the usual topic types (topic, concept, reference, task, glossentry or a specialization) accepted by the DITA type attribute. type=topic is assumed if type isn’t specified.

navtitle is the navtitle or other title or other text from the DITA document that could serve as identifying text as determined by the application that saved or stored the DITA content.

How does this look? Does it meet our needs for a content type for use with DITA documents stored in a CMS?

Is topic-type-value good enough or does this need to somehow provide the Public ID value? For DITA purposes, I think the topic-type based on the root element name is fine, but I’m interested to know what Paul thinks.

Is a format of “dita-fragment” and/or “ditamap-fragment” needed?

-Jeff

From: Ogden, Jeff
Sent: Friday, March 28, 2008 10:27 AM
To: Grosso, Paul; 'dita@lists.oasis-open.org'
Subject: RE: [dita] mime type for DITA?

Is there a specific use case that caused the issue of DITA mime type(s) to be raised again now?

At PTC/Arbortext we talked about having an official mine type for DITA objects about a year ago, but we’ve been able to get by without it quite nicely so far.

If we do try to get some sort of official mine type, I’d like to find a way that we could tell more than just that we have a DITA document. Ideally I’d like to know in order:

1. If we have a DITA document

2. If we have a DITA map, DITA topic, DITA ditabase, or ditaval document

3. What type of DITA document we have (topic, concept, reference, task, glossentry, map, bookmap, learningMap, …)

I’d need to go back and refresh myself on the details of mime type syntax, but I vaguely remember that there were ways to provide more detailed information without creating completely new mine types.

I agree with Paul that having an official mine type may not provide much additional benefit and so I’m not pushing for this myself, but if we do go forward I’d like to be able to get additional information beyond just DITA or not DITA.

And even if we don’t go forward with an official mine type, I’d like to somehow encourage CMS implementers to include this level of detail in the CMS metadata associated with a DITA object so that someone can use or get this information as they search or browse without having to open each DITA object each time.

-Jeff

From: Grosso, Paul [mailto:pgrosso@ptc.com]
Sent: Friday, March 28, 2008 9:59 AM
To: dita@lists.oasis-open.org
Subject: RE: [dita] mime type for DITA?

While a mime type can define a fragment identifier syntax, there is always the question of what tools will recognize and implement that fragment identifier syntax. Presumably, in the DITA case, it will just be DITA tools which already recognize the syntax. So defining a mime type specific fragment identifier syntax does allow us to say our href values are true and official URIs, but it doesn't change too much in practice. (I don't see the argument as either strongly for or against whether we should define a dita mime type.)

Using application/dita+xml to allow tools to recognize dita content sounds like a benefit. Again, though, you have to ask what tools will actually access the mime type and recognize--and do something special--with the dita mime type. The answer again is just dita tools which already recognize dita content. So the only benefit might be making it a bit easier for such tools to know they have dita without looking inside the content.

But note that one can get a mime type only from mime headers, and one has mime headers for a file in only rare cases in practice (and half the time when you do have them, they are wrong or incomplete). The rest of the time, tools guess mime type by looking at the file extension or inspecting the content, and this can and is already done.

So defining a mime type probably has only a minor benefit in practice.

I would counsel against trying to define multiple mime types. Given the small benefit of mime types in general, if we try to get too complicated here, we'll pretty much guarantee that there will never be two fully interoperable implementations, and I don't see the benefit of multiple mime types.

paul

From: Erik Hennum [mailto:ehennum@us.ibm.com]
Sent: Thursday, 2008 March 27 18:47
To: dita@lists.oasis-open.org
Subject: [dita] mime type for DITA?

Hi, Technical Committee:

Returning to an old question, should the committee take a position with respect to a mime type for DITA?

http://lists.oasis-open.org/archives/dita/200408/msg00055.html

A DITA mime type would let tools declare and recognize DITA content in HTTP headers, email, and so on without actually inspecting the content. As I understand Paul's note, a mime type would also provide a basis for defining the DITA reference syntax within URI standards:

http://lists.oasis-open.org/archives/dita/200705/msg00040.html

One might expect DITA to have a mime type similar to application/dita+xml following ordinary practice for XML vocabularies:

http://en.wikipedia.org/wiki/XML_and_MIME

DITA is an architecture, however, not a vocabulary. Section A.14 in the relevant RFC suggests that an extensible architecture should prepend qualification levels:

ftp://ftp.isi.edu/in-notes/rfc3023.txt

Applied to DITA, that would seem to call for a mime type that separates the declaration of the vocabulary (as defined by the shell for the document type) from the DITA architecture from the XML architecture.

An application that recognizes a document type can process any document that generalizes to a valid instance of the recognized document type. Because of shell pluggability, however, the mime type alone can't reasonably provide a basis for determining the compatibility of a document type accepted by an application with the document type of the supplied document. (The mime type for the document type would have to encode the modules and ancestor modules included by the shell, effectively cramming the value of the domains attribute into an identifier.)

A reasonable compromise might be for the mime type to identify only the base vocabulary. (That compromise also acknowledges the impracticality of registering all DITA shells as mime types.) Applications would have to inspect the domains attribute in the content for more specific evaluation of acceptability. This approach avoids creating a legacy that would have to be accomodated if future work solves the document type compatibility problem some other way.

In summary, this approach would introduce two fundamental mime types for topics and maps:

application/topic+dita+xml
application/map+dita+xml

Because the DITA values file isn't specializable (doesn't provide the architecture attributes), their mime types should identify the XML vocabulary but not the DITA architecture:

application/ditaval+xml

Hoping that's useful,

Erik Hennum
ehennum@us.ibm.com

dita message