[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: RE: [docstandards-interop-discuss] Clarifications / Scope of the intended work?
-------- Original Message --------
Subject: RE: [docstandards-interop-discuss] Clarifications / Scope of
the intended work?
From: "Earley, Jim" <Jim.Earley@flatironssolutions.com>
Date: Tue, April 10, 2007 1:52 pm
To: "David RR Webber (XML)" <david@drrw.info>
Cc: "Dave Pawson" <dave.pawson@gmail.com>,
<docstandards-interop-discuss@lists.oasis-open.org>Dave, Why are we trying to preserve page semantics? What's wrong with XML DOM? XSLT/XSL-FO? XQuery? These are already in the toolchain. Jim ================ Jim Earley XML Developer/Consultant Flatirons Solutions 4747 Table Mesa Drive Boulder, CO 80301 Voice: 303.542.2156 Fax: 303.544.0522 Cell: 303.898.7193 Yahoo.IM: jmearley MSN.IM: jearley22@hotmail.com jim.earley@flatironssolutions.com -----Original Message----- From: David RR Webber (XML) [mailto:david@drrw.info] Sent: Tuesday, April 10, 2007 11:46 AM To: Earley, Jim Cc: Dave Pawson; docstandards-interop-discuss@lists.oasis-open.org Subject: RE: [docstandards-interop-discuss] Clarifications / Scope of the intended work? Jim, NO NO NO - its not PDF that is the answer!!! Stop thinking PDF please. Yes we know that iText is built for PDF - but that's just the start point here. What I'm saying is use the model - not the specific rendering. Our XML-script syntax is neutral - will work with ANY document syntax. It just is that PDF already has one implemented - so they are ahead of the game at this point - but should not take long for peoples development teams to adapt the iText code base to work for ODF and more as well. So the real abstraction is the in-memory page object model that iText is using when it runs - just as the DOM is for XML inside a browser.... DW "The way to be is to do" - Confucius (551-472 B.C.) -------- Original Message -------- Subject: RE: [docstandards-interop-discuss] Clarifications / Scope of the intended work? From: "Earley, Jim" <Jim.Earley@flatironssolutions.com> Date: Tue, April 10, 2007 1:33 pm To: "David RR Webber (XML)" <david@drrw.info> Cc: "Dave Pawson" <dave.pawson@gmail.com>, <docstandards-interop-discuss@lists.oasis-open.org> Dave, Here in lies the rub, me thinks: >> By creating a standard around the functions and the processing - we establish that "lingua franca" at the level of the processing required - not the underlying vendor specific document syntax >> goup - that will change every time they release a new product. This is why we've proposed going to an abstraction layer to enable the respective standards to keep evolving to meet their constituents' needs, yet allowing interoperability at the markup level, which I contend is much more robust with respect to content reusability, which is what I hear repeatedly from authors (and not just in the "standard" software Tech Pubs space). -- I would argue that going to PDF and then to iText to produce interoperability markup, you impose the presentation (and inherently, the presentational structure, which may or may not be equivalent to the originating markup structure) on the recipient, lock, stock and barrel. DITA topics do not necessarily have to begin on a new page, neither do DocBook sections. These are presentatation-specific details that are intentionally left out of the markup to enable reuse across documents and output formats. Formatting is fluid based on many different factors: company branding, output format, localization, even audience to name a few. This is why I believe that separating the presentation from the data is absolutely critical. In my experience at a Fortune 50 company that changed their branding every 18-24 months, having the content in structured, semantic markup saved our skins in more ways than you can possibly imagine. We could in a few weeks rebrand thousands of manuals formatted in HTML and PDF in over 9 languages (_because_ the content was in XML) that would otherwise take months if we had to tinker with the formatting. I've been down that road with things like MS Word or FrameMaker (both structured and unstructured) and rapidly run the other way when the topic comes up. What about effectivity (conditional processing) attributes? What if I create an XML document that contains content embedded for different operating systems, each of which is rendered into separate outputs? How do I capture these at the presentation layer and then enable authors to leverage the content appropriately? These are things that XML markup are very effective at. It's also been my experience that authors are embracing XML markup now because the tools support is now readily available from numerous vendors (and they don't have to get their hands dirty with the actual markup!). They are seeing the benefits of working with structured markup with respect to content reuse and single sourcing. Now their biggest problem is pulling in content from other sources using different XML standards into their content. I believe that what we've proposed enables authors to do this without a significant amount of retooling. Jim ================ Jim Earley XML Developer/Consultant Flatirons Solutions 4747 Table Mesa Drive Boulder, CO 80301 Voice: 303.542.2156 Fax: 303.544.0522 Cell: 303.898.7193 Yahoo.IM: jmearley MSN.IM: jearley22@hotmail.com jim.earley@flatironssolutions.com -----Original Message----- From: David RR Webber (XML) [mailto:david@drrw.info] Sent: Tuesday, April 10, 2007 10:11 AM To: Earley, Jim Cc: Dave Pawson; docstandards-interop-discuss@lists.oasis-open.org Subject: RE: [docstandards-interop-discuss] Clarifications / Scope of the intended work? Jim, I'd argue that you are making my point for me!!! What we need are FUNCTIONS that match the business requirements you state here. Your example - "In these cases, the structural and semantic characterists are equally important: a procedure may appear as a numbered list presentationally, but semantically it is very different than a set of items in a sequenced list." So - if I was using iText to do this - I can handle this both ways - either get the XML from whereever - and then produce the numbered list (and embed matching XML metacontent) into PDF - or the reverse - find the numbered list in the PDF - extract it out - create the XML. By creating a standard around the functions and the processing - we establish that "lingua franca" at the level of the processing required - not the underlying vendor specific document syntax goup - that will change every time they release a new product. The vendors then simply provide implementations to our functional set - and anyone can then create XML-script handling of their documents - inbound or outbound - in a consistent way to our specification. Bottom line is - its the functional handling equivalence we are wanting. This may ultimately drive syntax alignment - but we do not have to get into that ourselves. DW "The way to be is to do" - Confucius (551-472 B.C.) -------- Original Message -------- Subject: RE: [docstandards-interop-discuss] Clarifications / Scope of the intended work? From: "Earley, Jim" <Jim.Earley@flatironssolutions.com> Date: Tue, April 10, 2007 11:49 am To: "David RR Webber (XML)" <david@drrw.info> Cc: "Dave Pawson" <dave.pawson@gmail.com>, <docstandards-interop-discuss@lists.oasis-open.org> David, Respectfully, I believe the issue isn't at the presentation layer but more at the content layer: How do I leverage/reuse/repurpose content in one XML Standard (say DITA) in my content (say DocBook)? Here the question is more targeted at content interoperability. For example, Vendor A provides content to an OEM partner who will rebrand it and integrate Vendor A's content into their own doc set (could be PDF, HTML, HTML Help, JavaHelp, or any number of formats). Further down the pipeline, the content is reused in Training material by a different group using TEI. In these cases, the structural and semantic characterists are equally important: a procedure may appear as a numbered list presentationally, but semantically it is very different than a set of items in a sequenced list. By abstracting each XML standard's specific content models to a common denominator, you can preserve structure along with semantics in a way that enables other XML standards to leverage the content using their grammar with minimal loss to semantics from the original. Certainly, there are cases as you mentioned that require the presentational functionality to be preserved "as submitted" that do not apply here. And in these cases, your approach to maintaining the presentational semantics is very interesting. I've used iText for personal projects, and yes, it is very mature. Cheers, Jim ================ Jim Earley XML Developer/Consultant Flatirons Solutions 4747 Table Mesa Drive Boulder, CO 80301 Voice: 303.542.2156 Fax: 303.544.0522 Cell: 303.898.7193 Yahoo.IM: jmearley MSN.IM: jearley22@hotmail.com jim.earley@flatironssolutions.com -----Original Message----- From: David RR Webber (XML) [mailto:david@drrw.info] Sent: Tuesday, April 10, 2007 9:02 AM To: Earley, Jim Cc: Dave Pawson; docstandards-interop-discuss@lists.oasis-open.org Subject: RE: [docstandards-interop-discuss] Clarifications / Scope of the intended work? Jim, Why not focus on the handling functions instead? That way you are an abstraction layer above the lowlevel representation syntax. The xhtml is problematic - especially when it comes to page counts and page content. Legally also - you need to leave things "as submitted" - because you may reject a submission as say not having content in the right place on a page, or total pages - and yet the original was OK when viewed in the native format. Also - by going with functions - you put the onus on the individual tool vendors to support those functions consistently - without having to get into the lower level syntax ourselves of how that occurs, either now or future new formats. At the end of the day it is the BUSINESS FUNCTIONALITY that you want interoperability around - not the raw document. So from the business stance - if I need to check for certain bookmarks, sections, text strings, page counts, word counts, etc - I can do that. DW "The way to be is to do" - Confucius (551-472 B.C.) -------- Original Message -------- Subject: RE: [docstandards-interop-discuss] Clarifications / Scope of the intended work? From: "Earley, Jim" <Jim.Earley@flatironssolutions.com> Date: Tue, April 10, 2007 10:46 am To: "Dave Pawson" <dave.pawson@gmail.com>, <docstandards-interop-discuss@lists.oasis-open.org> Dave, The current thinking with regard to a solution uses XHTML Microformats as the abstraction layer. All of the standards (DITA, DB, ODF) share the same structural characteristics (Headings, paragraphs, lists, tables, images, etc.) albeit in different ways. The premise thus far is: 1. use standard XHTML markup for common semantic/structural components (table, img, p, ol, acronym, strong, em, etc) 2. For structural components that do not have an equivalent XHTML mapping, use <div> 3. For inline semantics that do not have an equivalent XHTML mapping, use <span> - use the title attribute (available on any XHTML element) to store the original element name - use the class attribute to store the "semantic category": e.g., "procedural" vs. "list" to delineate between a procedural set of steps compared to a numbered list - there are a couple of ideas that we're playing with with regard to capturing the attribute values from the original source: a) Use the object tag (with child param tags to capture the name/value pairs) b) Use a declared namespace to embed the attributes on the element These are, of course, open for discussion. Jim ================ Jim Earley XML Developer/Consultant Flatirons Solutions 4747 Table Mesa Drive Boulder, CO 80301 Voice: 303.542.2156 Fax: 303.544.0522 Cell: 303.898.7193 Yahoo.IM: jmearley MSN.IM: jearley22@hotmail.com jim.earley@flatironssolutions.com -----Original Message----- From: Dave Pawson [mailto:dave.pawson@gmail.com] Sent: Tuesday, April 10, 2007 8:12 AM To: docstandards-interop-discuss@lists.oasis-open.org Subject: Re: [docstandards-interop-discuss] Clarifications / Scope of the intended work? On 10/04/07, Michael Priestley <mpriestl@ca.ibm.com> wrote: > - govt worker begins drafting a policy note in ODF with the subject "the use of personal data received via email" > - govt worker pulls in the text of the relevant statute, which is in a DITA specialization > - govt worker pulls in the legal disclaimer which must now be included in every government email reply, from a different DITA specialization > - govt worker pulls in the instructions on how to include the text of the disclaimer in emails, from documentation of the email software written in DocBook > - technical author 2, using DocBook, creates a customized version of the email software documentation > - and pulls in portions of the procedures web site, in the form of DITA topics and ODF policy notes OK, you've described the problem Michael. I hope we can all sympathise with that! Ignoring how, what do you see as a solution? A means of 'integrating' n streams? A way of reading n streams? A means of generating .... something readable by all.... (lcd solution) What class of solution is the goal please? regards -- Dave Pawson XSLT XSL-FO FAQ. http://www.dpawson.co.uk <http://www.dpawson.co.uk/> <http://www.dpawson.co.uk/> <http://www.dpawson.co.uk/> --------------------------------------------------------------------- To unsubscribe, e-mail: docstandards-interop-discuss-unsubscribe@lists.oasis-open.org For additional commands, e-mail: docstandards-interop-discuss-help@lists.oasis-open.org
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]