docstandards-interop-discuss message

Subject: Re: [docstandards-interop-discuss] Clarifications / Scope of the intended work?

From: marbux <marbux@gmail.com>
To: docstandards-interop-discuss@lists.oasis-open.org
Date: Tue, 10 Apr 2007 16:12:02 -0400

I'm starting to get the drift, I think, despite my weakness in
understanding programming concepts. There are more than a few areas
that will be problematic. :-) But I think it might be useful to do
some outreach to involve others who are grappling with the same or
similar problems, both to improve the proposed work product and to
encourage its adoption.

For example, the Apache Cocoon development team has already developed
a XML-based scripting language ("sitemap" in Cocoon parlance") for
specifying the serialized output of transformations in a business
process whether executed using XSLT, STX, or XInclude with XPointer
framework support, with input and output in a wide variety of data
formats. See <http://cocoon.apache.org/2.1/features.html>.  For
example markup in the context of aggregating data from multiple
sources programmatically, see
<http://wiki.apache.org/cocoon/CocoonProtocolExample>.

I haven't checked in to see what happened with it, but there was also
an OASIS proposal roughly a year ago to develop a similar language for
scripting  transformations from a wide variety of markup sources
(specifically including at least Docook and HTML) to Windows Help
(CHM) format. (I wound up boycotting the effort because of the
proposal leader's resistance to identifying accessibility validation
as a foundation-level aspect of the specification.)

I like the idea of raising the abstraction level of transformation,
aggregation, etc.  methods to a meta-level hub language, particularly
if it can be implemented in a user-friendly way. But I am also wary of
the approach of designing such a language to encompass only two
document markup languages and then extending it to others. The concern
is that an approach that works for two document languages might not
extend gracefully to others. E.g., the new Microsoft Office Open XML
formats are reputedly very XPath and XSL-FO-unfriendly, so a mapping
method that depended heavily on XPath and XSL-FO might be far less
than an ideal approach for transformations to and from MOOXML. So I'm
left wondering whether it might be better to involve others concerned
with many types of automated transformations from the git-go.

On the pageless/paper document distinction, there should be provision,
I think, for preserving paper document metadata from the source data,
e.g., page, paragraph, and list numbering. The particular problems I
have in mind are those of creating citations automagically and quoting
lists. The legal profession is  well down the path of citations to
document paragraph numbers rather than page numbers, but all federal
courts for example currently require simultaneous eFiling of documents
and filing in paper format as part of the transitional state to pure
eFiling. So for many legal documents, preservation of both page and
paragraph numbers is de rigeur. And changing of list numbering from
source to destination documents would violate so many publishing style
books that it would take either a brave or foolish soul to argue that
it doesn't matter. So I'm left wondering by this excerpt from the
iText tutorial:

"Why getPagenumber doesn't work.

Class Document is all about content, not about presentation. Different
writers can listen to the same Document object, so it makes absolutely
no sense asking the Document object for its current pagenumber. Which
number would the Document have to return? The pagenumber of the PDF
representation or the one of the HTML (this makes even less sense)? In
short: you should ask the specific writer for the pagenumber, NOT the
Document object."

Granted, I have about 10 minutes of total time spent studying the
tutorial, but this one near the top caught my eye. I think it
essential that the proposal recognize that we live in a relevant time
of transition.

I also think it absolutely essential that accessibility validation be
part of the foundation being constructed for this proposal. But
extraction and aggregation of data from a variety of formats presents
some obvious problems for producing accessible documents as output
from such a hub. So I'd say call in the accessibility experts very
early on.

This all assumes, of course, that I have at least a glimmer of the goals. :-)

Best regards,

Marbux

Follow-Ups:
- Re: [docstandards-interop-discuss] Clarifications / Scope of the intended work?
  - From: "Dave Pawson" <dave.pawson@gmail.com>

References:
- RE: [docstandards-interop-discuss] Clarifications / Scope of the intended work?
  - From: "David RR Webber \(XML\)" <david@drrw.info>
- Re: [docstandards-interop-discuss] Clarifications / Scope of the intended work?
  - From: "Dave Pawson" <dave.pawson@gmail.com>