docstandards-interop-discuss message

Subject: RE: [docstandards-interop-discuss] Clarifications / Scope of the intendedwork?

From: Michael Priestley <mpriestl@ca.ibm.com>
To: "David RR Webber \(XML\)" <david@drrw.info>
Date: Tue, 10 Apr 2007 12:29:07 -0400

Hi David,

>By creating a standard around the functions and the processing - we establish that "lingua franca" at the level of
>the processing required - not the underlying vendor specific document syntax goup - that will change every time they release a new product.

Are you suggesting that the output format from an XML document is more stable than the XML format itself? That certainly seems at odds with the generally asserted purpose of XML, to separate content from presentation so that the content is reusable across multiple presentation contexts. It sounds like what you're saying is that content is more reusable when it is combined with presentation. This is counter to the founding assumptions of most XML document standards.

I can see the point of integrating PDF into a document lifecycle as source when the original document source is not available, or is in some vendor-specific format - but that is not the context here. Using PDF as an interchange format between DocBook and DITA, for example, would be very strange indeed.

Michael Priestley
IBM DITA Architect and Classification Schema PDT Lead
mpriestl@ca.ibm.com
http://dita.xml.org/blog/25

"David RR Webber \(XML\)" <david@drrw.info>

04/10/2007 12:11 PM

To	"Earley,Jim" <Jim.Earley@flatironssolutions.com>
cc	Dave Pawson <dave.pawson@gmail.com>, docstandards-interop-discuss@lists.oasis-open.org
Subject	RE: [docstandards-interop-discuss] Clarifications / Scope of the intended work?

Jim,

I'd argue that you are making my point for me!!!

What we need are FUNCTIONS that match the business requirements you state here.

Your example - "In these cases, the structural and semantic characterists are equally
important: a procedure may appear as a numbered list presentationally, but
semantically it is very different than a set of items in a sequenced
list."

So - if I was using iText to do this - I can handle this both ways - either get the XML from whereever - and then produce the numbered list (and embed matching XML metacontent) into PDF - or the reverse - find the numbered list in the PDF - extract it out - create the XML.

By creating a standard around the functions and the processing - we establish that "lingua franca" at the level of the processing required - not the underlying vendor specific document syntax goup - that will change every time they release a new product.

The vendors then simply provide implementations to our functional set - and anyone can then create XML-script handling of their documents - inbound or outbound - in a consistent way to our specification.

Bottom line is - its the functional handling equivalence we are wanting.

This may ultimately drive syntax alignment - but we do not have to get into that ourselves.

DW

"The way to be is to do" - Confucius (551-472 B.C.)

-------- Original Message --------
Subject: RE: [docstandards-interop-discuss] Clarifications / Scope of
the intended work?
From: "Earley, Jim" <Jim.Earley@flatironssolutions.com>
Date: Tue, April 10, 2007 11:49 am
To: "David RR Webber (XML)" <david@drrw.info>
Cc: "Dave Pawson" <dave.pawson@gmail.com>,
<docstandards-interop-discuss@lists.oasis-open.org>

David, Respectfully, I believe the issue isn't at the presentation layer but more at the content layer: How do I leverage/reuse/repurpose content in one XML Standard (say DITA) in my content (say DocBook)? Here the question is more targeted at content interoperability. For example, Vendor A provides content to an OEM partner who will rebrand it and integrate Vendor A's content into their own doc set (could be PDF, HTML, HTML Help, JavaHelp, or any number of formats). Further down the pipeline, the content is reused in Training material by a different group using TEI. In these cases, the structural and semantic characterists are equally important: a procedure may appear as a numbered list presentationally, but semantically it is very different than a set of items in a sequenced list. By abstracting each XML standard's specific content models to a common denominator, you can preserve structure along with semantics in a way that enables other XML standards to leverage the content using their grammar with minimal loss to semantics from the original. Certainly, there are cases as you mentioned that require the presentational functionality to be preserved "as submitted" that do not apply here. And in these cases, your approach to maintaining the presentational semantics is very interesting. I've used iText for personal projects, and yes, it is very mature. Cheers, Jim ================ Jim Earley XML Developer/Consultant Flatirons Solutions 4747 Table Mesa Drive Boulder, CO 80301 Voice: 303.542.2156 Fax: 303.544.0522 Cell: 303.898.7193 Yahoo.IM: jmearley MSN.IM:jearley22@hotmail.comjim.earley@flatironssolutions.com-----Original Message----- From: David RR Webber (XML) [mailto:david@drrw.info] Sent: Tuesday, April 10, 2007 9:02 AM To: Earley, Jim Cc: Dave Pawson;docstandards-interop-discuss@lists.oasis-open.orgSubject: RE: [docstandards-interop-discuss] Clarifications / Scope of the intended work? Jim, Why not focus on the handling functions instead? That way you are an abstraction layer above the lowlevel representation syntax. The xhtml is problematic - especially when it comes to page counts and page content. Legally also - you need to leave things "as submitted" - because you may reject a submission as say not having content in the right place on a page, or total pages - and yet the original was OK when viewed in the native format. Also - by going with functions - you put the onus on the individual tool vendors to support those functions consistently - without having to get into the lower level syntax ourselves of how that occurs, either now or future new formats. At the end of the day it is the BUSINESS FUNCTIONALITY that you want interoperability around - not the raw document. So from the business stance - if I need to check for certain bookmarks, sections, text strings, page counts, word counts, etc - I can do that. DW "The way to be is to do" - Confucius (551-472 B.C.) -------- Original Message -------- Subject: RE: [docstandards-interop-discuss] Clarifications / Scope of the intended work? From: "Earley, Jim" <Jim.Earley@flatironssolutions.com> Date: Tue, April 10, 2007 10:46 am To: "Dave Pawson" <dave.pawson@gmail.com>, <docstandards-interop-discuss@lists.oasis-open.org> Dave, The current thinking with regard to a solution uses XHTML Microformats as the abstraction layer. All of the standards (DITA, DB, ODF) share the same structural characteristics (Headings, paragraphs, lists, tables, images, etc.) albeit in different ways. The premise thus far is: 1. use standard XHTML markup for common semantic/structural components (table, img, p, ol, acronym, strong, em, etc) 2. For structural components that do not have an equivalent XHTML mapping, use <div> 3. For inline semantics that do not have an equivalent XHTML mapping, use <span> - use the title attribute (available on any XHTML element) to store the original element name - use the class attribute to store the "semantic category": e.g., "procedural" vs. "list" to delineate between a procedural set of steps compared to a numbered list - there are a couple of ideas that we're playing with with regard to capturing the attribute values from the original source: a) Use the object tag (with child param tags to capture the name/value pairs) b) Use a declared namespace to embed the attributes on the element These are, of course, open for discussion. Jim ================ Jim Earley XML Developer/Consultant Flatirons Solutions 4747 Table Mesa Drive Boulder, CO 80301 Voice: 303.542.2156 Fax: 303.544.0522 Cell: 303.898.7193 Yahoo.IM: jmearley MSN.IM:jearley22@hotmail.com jim.earley@flatironssolutions.com -----Original Message----- From: Dave Pawson [mailto:dave.pawson@gmail.com] Sent: Tuesday, April 10, 2007 8:12 AM To:docstandards-interop-discuss@lists.oasis-open.org Subject: Re: [docstandards-interop-discuss] Clarifications / Scope of the intended work? On 10/04/07, Michael Priestley <mpriestl@ca.ibm.com> wrote: > - govt worker begins drafting a policy note in ODF with thesubject "the use of personal data received via email" > - govt worker pulls in the text of the relevant statute, which isin a DITA specialization > - govt worker pulls in the legal disclaimer which must now be included in every government email reply, from a different DITA specialization > - govt worker pulls in the instructions on how to include the text of the disclaimer in emails, from documentation of the email software written in DocBook > - technical author 2, using DocBook, creates a customized versionof the email software documentation > - and pulls in portions of the procedures web site, in the form ofDITA topics and ODF policy notes OK, you've described the problem Michael. I hope we can all sympathise with that! Ignoring how, what do you see as a solution? A means of 'integrating' n streams? A way of reading n streams? A means of generating .... something readable by all.... (lcd solution) What class of solution is the goal please? regards -- Dave Pawson XSLT XSL-FO FAQ. http://www.dpawson.co.uk<http://www.dpawson.co.uk/> --------------------------------------------------------------------- To unsubscribe, e-mail: docstandards-interop-discuss-unsubscribe@lists.oasis-open.org For additional commands, e-mail: docstandards-interop-discuss-help@lists.oasis-open.org
--------------------------------------------------------------------- To unsubscribe, e-mail: docstandards-interop-discuss-unsubscribe@lists.oasis-open.org For additional commands, e-mail: docstandards-interop-discuss-help@lists.oasis-open.org

References:
- RE: [docstandards-interop-discuss] Clarifications / Scope of the intended work?
  - From: "David RR Webber \(XML\)" <david@drrw.info>