docstandards-interop-discuss message

Subject: Re: [docstandards-interop-discuss] Clarifications / Scope of the intended work?

From: "Dave Pawson" <dave.pawson@gmail.com>
To: docstandards-interop-discuss@lists.oasis-open.org
Date: Wed, 11 Apr 2007 09:55:09 +0100

On 11/04/07, marbux <marbux@gmail.com> wrote:

> > > I haven't checked in to see what happened with it, but there was also
> > > an OASIS proposal roughly a year ago to develop a similar language for
> > > scripting  transformations
> >
> >
> >
> > For what purpose?
> >
> Basically an interchange format for conversions to Windows Help (CHM)

I meant what might be the purpose wrt this TC usage?
Why would we want to use such technology?



> > Is this an 'Esperanto'  based solution? From format A into Esperanto,
> > then Esperanto out to format B?
> >
>
> No, at least the way I understand what's being proposed. If you
> haven't done so, it might help to take a look at the slides Mike
> linked earlier.
> <http://flatironssolutions.com/Downloads/DITA2007West.pdf>. I'll
> probably describe it more poorly, but it seems like the proposal is
> for a meta-language to be used for scripting the extraction,
> transformation, aggregation, and serialization of data from a variety
> of documentation formats to a variety of documentation formats.

So for every input format there would need to be a 'transform' to
every other 'wanted' format?
I don't think this compares favourably in terms of effort compared
to an 'esperanto' or hub model?
Each input format transforms to the Esperanto
Each output format transforms from the Esperanto.




So
> basically an XML scripting language to automate such steps. So rather
> than Esperanto, something more like the "sitemap" XML language used by
> the Apache Cocoon project, something closer to XSL:FO than Esperanto,

fo is layout oriented? I dont' favour such an approach at all.


> but abstracted another layer so that processing of a variety of
> transformation languages could be scripted using the same scripting
> meta-language. (That's probably clear as mud and may be wrong to boot,
> so take it with a grain of salt.)

I'll not judge it till I understand how it might fit and its purpose.




> > > On the pageless/paper document distinction, there should be provision,
> > > I think, for preserving paper document metadata from the source data,
> >
> >
> > I disagree, but would like to hear the rationale, including examples
> > where the sender and recipients 'pages' are totally different, and how
> > mapping might address this.
> >
> >
> As I understand it we're not discussing a sender-recipient
> interaction, at least in a human sender-recipient sense. We're more
> concerned with automated extraction of data, its transformation, its
> aggregation, and its serialization.


I don't parse that statement. Documents and data are generally for humans.
so surely there must be a sender and recipient, even if we count
organisation A and B as those?

The hub based transforms may be automated, the rationale for the transform
is to move information from A to B?





>
> Let's assume our implementing app was used in the legal field and part
> of what was being extracted had line numbering and the text being
> extracted had references to line numbers.

Then it would fail as soon as the sender and recipient used differing
line lengths, due to margins, font size, font, paper etc.

Numbered paragraphs may work. IMHO that is para level metadata.



 You'd want the implementing
> app to extract the line numbers and keep them the same once the data
> the data is transformed, aggregated with other data, and serialized to
> the format need by the app that will make the presentation.

Scope issue? I'd suggest usage (including presentation) is out of scope
for this TC. I would propose that our scope extends from the exit point
of org A through to the entrance to org B. I.e. how the data is used is
none of our concern.


Loop the
> same concept for list item numbers and page numbers.

See my earlier posts as to why page numbers are pointless.

List numbering is again metadata on the list.


>
> > >
> > > Granted, I have about 10 minutes of total time spent studying the
> > > tutorial, but this one near the top caught my eye. I think it
> > > essential that the proposal recognize that we live in a relevant time
> > > of transition.
> >
> > My view is that this is not in scope for this TC.
> >
>
> Perhaps. I could easily have got 90 per cent of what's gone by on this
> list wrong. As I said, I'm not a code warrior. But if I am
> understanding what's being proposed, then unique traits of a paper
> format document being extracted might well need to be preserved in
> some fashion.

In which case send paper (or some faithful representation) and you'd want
to sidestep this TC's work as being irrelevant.

Michael mentioned XML as the input format. *if* that remains, then
page based content such as pdf or quark is out of scope?





>
> > > I also think it absolutely essential that accessibility validation be
> > > part of the foundation being constructed for this proposal.
> >
> > Accessibility of what please? The source format, the 'Esperanto' solution
> > (as an intermediary language) or the target format?
> >
>
> All three. Say a source document includes VoiceXML document navigation
> code and you want the output data to include Voice XML or some other
> document navigation language. Then you need to preserve and transform
> the relevant accessibility features of the document and have it
> translated into some format with useful accessibility navigation in
> the output.

I'd rule that out of scope (personal view).
If my view of scope persists, the machine based exchange format input
and output would not be used by people. Hence the accessibility issues
would relate to the input and output formats and would be out of scope
for this TC.


 You might take a look at HawHaw, a PHP web app for
> producing web pages for full screen and mobile devices that can
> transform on the fly a variety of document navigation languages
> including VoiceXML using an XML intermediary language for scripting
> the process. <http://www.hawhaw.de/>. Its intermediary XML language is
> also analogous to what's being discussed here, I think.

Would that be something along the hub idea, my Esperanto?



>
> It will be wonderful if you can at least monitor what
> happens here and chime in when you think accessibility issues might be
> involved.
<grin/> I always do.



> Got to fly; I'm indentured to my daughter tonight, installing Linux on
> her computer. I'd be interested in feedback on whether I'm coming
> closer to an understanding of the suggested work.

Me too :-)

regards

-- 
Dave Pawson
XSLT XSL-FO FAQ.
http://www.dpawson.co.uk

References:
- RE: [docstandards-interop-discuss] Clarifications / Scope of the intended work?
  - From: "David RR Webber \(XML\)" <david@drrw.info>
- Re: [docstandards-interop-discuss] Clarifications / Scope of the intended work?
  - From: "Dave Pawson" <dave.pawson@gmail.com>
- Re: [docstandards-interop-discuss] Clarifications / Scope of the intended work?
  - From: marbux <marbux@gmail.com>
- Re: [docstandards-interop-discuss] Clarifications / Scope of the intended work?
  - From: "Dave Pawson" <dave.pawson@gmail.com>
- Re: [docstandards-interop-discuss] Clarifications / Scope of the intended work?
  - From: marbux <marbux@gmail.com>