[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [docstandards-interop-discuss] Clarifications / Scope of the intended work?
I'm starting to get the drift, I think, despite my weakness in understanding programming concepts. There are more than a few areas that will be problematic. :-) But I think it might be useful to do some outreach to involve others who are grappling with the same or similar problems, both to improve the proposed work product and to encourage its adoption. For example, the Apache Cocoon development team has already developed a XML-based scripting language ("sitemap" in Cocoon parlance") for specifying the serialized output of transformations in a business process whether executed using XSLT, STX, or XInclude with XPointer framework support, with input and output in a wide variety of data formats. See <http://cocoon.apache.org/2.1/features.html>. For example markup in the context of aggregating data from multiple sources programmatically, see <http://wiki.apache.org/cocoon/CocoonProtocolExample>. I haven't checked in to see what happened with it, but there was also an OASIS proposal roughly a year ago to develop a similar language for scripting transformations from a wide variety of markup sources (specifically including at least Docook and HTML) to Windows Help (CHM) format. (I wound up boycotting the effort because of the proposal leader's resistance to identifying accessibility validation as a foundation-level aspect of the specification.) I like the idea of raising the abstraction level of transformation, aggregation, etc. methods to a meta-level hub language, particularly if it can be implemented in a user-friendly way. But I am also wary of the approach of designing such a language to encompass only two document markup languages and then extending it to others. The concern is that an approach that works for two document languages might not extend gracefully to others. E.g., the new Microsoft Office Open XML formats are reputedly very XPath and XSL-FO-unfriendly, so a mapping method that depended heavily on XPath and XSL-FO might be far less than an ideal approach for transformations to and from MOOXML. So I'm left wondering whether it might be better to involve others concerned with many types of automated transformations from the git-go. On the pageless/paper document distinction, there should be provision, I think, for preserving paper document metadata from the source data, e.g., page, paragraph, and list numbering. The particular problems I have in mind are those of creating citations automagically and quoting lists. The legal profession is well down the path of citations to document paragraph numbers rather than page numbers, but all federal courts for example currently require simultaneous eFiling of documents and filing in paper format as part of the transitional state to pure eFiling. So for many legal documents, preservation of both page and paragraph numbers is de rigeur. And changing of list numbering from source to destination documents would violate so many publishing style books that it would take either a brave or foolish soul to argue that it doesn't matter. So I'm left wondering by this excerpt from the iText tutorial: "Why getPagenumber doesn't work. Class Document is all about content, not about presentation. Different writers can listen to the same Document object, so it makes absolutely no sense asking the Document object for its current pagenumber. Which number would the Document have to return? The pagenumber of the PDF representation or the one of the HTML (this makes even less sense)? In short: you should ask the specific writer for the pagenumber, NOT the Document object." Granted, I have about 10 minutes of total time spent studying the tutorial, but this one near the top caught my eye. I think it essential that the proposal recognize that we live in a relevant time of transition. I also think it absolutely essential that accessibility validation be part of the foundation being constructed for this proposal. But extraction and aggregation of data from a variety of formats presents some obvious problems for producing accessible documents as output from such a hub. So I'd say call in the accessibility experts very early on. This all assumes, of course, that I have at least a glimmer of the goals. :-) Best regards, Marbux
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]