docstandards-interop-discuss message

Subject: RE: [docstandards-interop-discuss] Clarifications / Scope of the intended work?

From: "Earley, Jim" <Jim.Earley@flatironssolutions.com>
To: "David RR Webber \(XML\)" <david@drrw.info>
Date: Tue, 10 Apr 2007 11:52:18 -0600

Dave,

Why are we trying to preserve page semantics?  What's wrong with XML DOM?
XSLT/XSL-FO? XQuery?  These are already in the toolchain.  

Jim


================
Jim Earley
XML Developer/Consultant
Flatirons Solutions
4747 Table Mesa Drive
Boulder, CO 80301

Voice: 303.542.2156
Fax:   303.544.0522
Cell:  303.898.7193

Yahoo.IM: jmearley
MSN.IM: jearley22@hotmail.com

jim.earley@flatironssolutions.com
-----Original Message-----
From: David RR Webber (XML) [mailto:david@drrw.info] 
Sent: Tuesday, April 10, 2007 11:46 AM
To: Earley, Jim
Cc: Dave Pawson; docstandards-interop-discuss@lists.oasis-open.org
Subject: RE: [docstandards-interop-discuss] Clarifications / Scope of the
intended work?

Jim,
 
NO NO NO - its not PDF that is the answer!!!
 
Stop thinking PDF please.
 
Yes we know that iText is built for PDF - but that's just the start point
here.
 
What I'm saying is use the model - not the specific rendering.
 
Our XML-script syntax is neutral - will work with ANY document syntax.  
 
It just is that PDF already has one implemented - so they are ahead of the
game at this point - but should not take long for peoples development teams
to adapt the iText code base to work for ODF and more as well.
 
So the real abstraction is the in-memory page object model that iText is
using when it runs - just as the DOM is for XML inside a browser....
 
DW

"The way to be is to do" - Confucius (551-472 B.C.)




	-------- Original Message --------
	Subject: RE: [docstandards-interop-discuss] Clarifications / Scope
of
	the intended work?
	From: "Earley, Jim" <Jim.Earley@flatironssolutions.com>
	Date: Tue, April 10, 2007 1:33 pm
	To: "David RR Webber (XML)" <david@drrw.info>
	Cc: "Dave Pawson" <dave.pawson@gmail.com>,
	<docstandards-interop-discuss@lists.oasis-open.org>
	
	
	Dave,
	
	
	Here in lies the rub, me thinks:
	
	>> By creating a standard around the functions and the processing -
we
	establish that "lingua franca" at the level of the processing
required
	- not
	the underlying vendor specific document syntax >> goup - that will
change
	every time they release a new product.
	
	This is why we've proposed going to an abstraction layer to enable
the
	respective standards to keep evolving to meet their constituents'
	needs, yet
	allowing interoperability at the markup level, which I contend is
much
	more
	robust with respect to content reusability, which is what I hear
	repeatedly
	from authors (and not just in the "standard" software Tech Pubs
space).
	
	--
	
	I would argue that going to PDF and then to iText to produce
	interoperability markup, you impose the presentation (and
inherently, the
	presentational structure, which may or may not be equivalent to the
	originating markup structure) on the recipient, lock, stock and
	barrel. DITA
	topics do not necessarily have to begin on a new page, neither do
DocBook
	sections. These are presentatation-specific details that are
	intentionally
	left out of the markup to enable reuse across documents and output
	formats.
	Formatting is fluid based on many different factors: company
branding,
	output format, localization, even audience to name a few.  This is
why I
	believe that separating the presentation from the data is absolutely
	critical.
	
	
	In my experience at a Fortune 50 company that changed their branding
	every
	18-24 months, having the content in structured, semantic markup
saved our
	skins in more ways than you can possibly imagine. We could in a few
weeks
	rebrand thousands of manuals formatted in HTML and PDF in over 9
	languages
	(_because_ the content was in XML) that would otherwise take months
if we
	had to tinker with the formatting. I've been down that road with
	things like
	MS Word or FrameMaker (both structured and unstructured) and rapidly
	run the
	other way when the topic comes up.
	
	What about effectivity (conditional processing) attributes?  What if
I
	create an XML document that contains content embedded for different
	operating systems, each of which is rendered into separate outputs? 
	How do
	I capture these at the presentation layer and then enable authors to
	leverage the content appropriately?  These are things that XML
markup are
	very effective at.
	
	It's also been my experience that authors are embracing XML markup
now
	because the tools support is now readily available from numerous
vendors
	(and they don't have to get their hands dirty with the actual
markup!).
	They are seeing the benefits of working with structured markup with
	respect
	to content reuse and single sourcing. Now their biggest problem is
	pulling
	in content from other sources using different XML standards into
their
	content.  I believe that what we've proposed enables authors to do
this
	without a significant amount of retooling.
	
	Jim
	
	
	
	================
	Jim Earley
	XML Developer/Consultant
	Flatirons Solutions
	4747 Table Mesa Drive
	Boulder, CO 80301
	
	Voice: 303.542.2156
	Fax:   303.544.0522
	Cell:  303.898.7193
	
	Yahoo.IM: jmearley
	MSN.IM: jearley22@hotmail.com
	
	jim.earley@flatironssolutions.com
	-----Original Message-----
	From: David RR Webber (XML) [mailto:david@drrw.info] 
	Sent: Tuesday, April 10, 2007 10:11 AM
	To: Earley, Jim
	Cc: Dave Pawson; docstandards-interop-discuss@lists.oasis-open.org
	Subject: RE: [docstandards-interop-discuss] Clarifications / Scope
of the
	intended work?
	
	Jim,
	 
	I'd argue that you are making my point for me!!!
	 
	What we need are FUNCTIONS that match the business requirements you
state
	here.
	 
	Your example - "In these cases, the structural and semantic
characterists
	are equally
	important:  a procedure may appear as a numbered list
	presentationally, but
	semantically it is very different than a set of items in a sequenced
	list."
	 
	So - if I was using iText to do this - I can handle this both ways -
	either
	get the XML from whereever - and then produce the numbered list (and
	embed
	matching XML metacontent) into PDF - or the reverse - find the
	numbered list
	in the PDF - extract it out - create the XML.
	 
	By creating a standard around the functions and the processing - we
	establish that "lingua franca" at the level of the processing
required
	- not
	the underlying vendor specific document syntax goup - that will
change
	every
	time they release a new product.
	 
	The vendors then simply provide implementations to our functional
set
	- and
	anyone can then create XML-script handling of their documents -
	inbound or
	outbound - in a consistent way to our specification.
	 
	Bottom line is - its the functional handling equivalence we are
	wanting.  
	 
	This may ultimately drive syntax alignment - but we do not have to
get
	into
	that ourselves.
	 
	DW
	
	"The way to be is to do" - Confucius (551-472 B.C.)
	
	
	
	
	        -------- Original Message --------
	        Subject: RE: [docstandards-interop-discuss] Clarifications /
Scope
	of
	        the intended work?
	        From: "Earley, Jim" <Jim.Earley@flatironssolutions.com>
	        Date: Tue, April 10, 2007 11:49 am
	        To: "David RR Webber (XML)" <david@drrw.info>
	        Cc: "Dave Pawson" <dave.pawson@gmail.com>,
	        <docstandards-interop-discuss@lists.oasis-open.org>
	        
	        
	        David,
	        
	        Respectfully, I believe the issue isn't at the presentation
layer
	but
	        more
	        at the content layer:  How do I leverage/reuse/repurpose
content in
	        one XML
	        Standard (say DITA) in my content (say DocBook)? Here the
question
	is
	        more
	        targeted at content interoperability. For example, Vendor A
provides
	        content
	        to an OEM partner who will rebrand it and integrate Vendor
A's
	content
	        into
	        their own doc set (could be PDF, HTML, HTML Help, JavaHelp,
or any
	        number of
	        formats).  Further down the pipeline, the content is reused
in
	Training
	        material by a different group using TEI. 
	        
	        In these cases, the structural and semantic characterists
are
	equally
	        important:  a procedure may appear as a numbered list
	        presentationally, but
	        semantically it is very different than a set of items in a
sequenced
	        list.
	        
	        By abstracting each XML standard's specific content models
to a
	common
	        denominator, you can preserve structure along with semantics
in a
	way
	        that
	        enables other XML standards to leverage the content using
their
	        grammar with
	        minimal loss to semantics from the original.
	        
	        Certainly, there are cases as you mentioned that require the
	        presentational
	        functionality to be preserved "as submitted" that do not
apply here.
	        And in
	        these cases, your approach to maintaining the presentational
	semantics is
	        very interesting. I've used iText for personal projects, and
yes, it
	        is very
	        mature. 
	        
	        Cheers,
	        
	        Jim
	        
	        ================
	        Jim Earley
	        XML Developer/Consultant
	        Flatirons Solutions
	        4747 Table Mesa Drive
	        Boulder, CO 80301
	        
	        Voice: 303.542.2156
	        Fax:   303.544.0522
	        Cell:  303.898.7193
	        
	        Yahoo.IM: jmearley
	        MSN.IM: jearley22@hotmail.com
	        
	        jim.earley@flatironssolutions.com
	        -----Original Message-----
	        From: David RR Webber (XML) [mailto:david@drrw.info] 
	        Sent: Tuesday, April 10, 2007 9:02 AM
	        To: Earley, Jim
	        Cc: Dave Pawson;
docstandards-interop-discuss@lists.oasis-open.org
	        Subject: RE: [docstandards-interop-discuss] Clarifications /
Scope
	of the
	        intended work?
	        
	        Jim,
	         
	        Why not focus on the handling functions instead?  That way
you are
	an
	        abstraction layer above the lowlevel representation syntax.

	         
	        The xhtml is problematic - especially when it comes to page
counts
	and
	        page
	        content.  Legally also - you need to leave things "as
submitted" -
	        because
	        you may reject a submission as say not having content in the
right
	        place on
	        a page, or total pages - and yet the original was OK when
viewed in
	the
	        native format.
	         
	        Also - by going with functions - you put the onus on the
individual
	tool
	        vendors to support those functions consistently - without
having to
	        get into
	        the lower level syntax ourselves of how that occurs, either
now or
	future
	        new formats.
	         
	        At the end of the day it is the BUSINESS FUNCTIONALITY that
you want
	        interoperability around - not the raw document.
	         
	        So from the business stance - if I need to check for certain
	bookmarks,
	        sections, text strings, page counts, word counts, etc - I
can do
	that.
	         
	        DW
	        
	        "The way to be is to do" - Confucius (551-472 B.C.)
	        
	        
	        
	        
	                -------- Original Message --------
	                Subject: RE: [docstandards-interop-discuss]
Clarifications /
	Scope
	        of
	                the intended work?
	                From: "Earley, Jim"
<Jim.Earley@flatironssolutions.com>
	                Date: Tue, April 10, 2007 10:46 am
	                To: "Dave Pawson" <dave.pawson@gmail.com>,
	                <docstandards-interop-discuss@lists.oasis-open.org>
	                
	                
	                Dave,
	                
	                The current thinking with regard to a solution uses
XHTML
	        Microformats as
	                the abstraction layer. All of the standards (DITA,
DB, ODF)
	share
	        the
	                same
	                structural characteristics (Headings, paragraphs,
lists,
	tables,
	        images,
	                etc.) albeit in different ways. 
	                
	                The premise thus far is: 
	                
	                1. use standard XHTML markup for common
semantic/structural
	        components
	                (table, img, p, ol, acronym, strong, em, etc)
	                2. For structural components that do not have an
equivalent
	XHTML
	                mapping,
	                use <div>
	                3. For inline semantics that do not have an
equivalent XHTML
	        mapping, use
	                <span>
	                
	                - use the title attribute (available on any XHTML
element)
	to store
	        the
	                original element name
	                - use the class attribute to store the "semantic
category":
	e.g.,
	                "procedural" vs. "list" to delineate between a
procedural
	set of
	        steps
	                compared to a numbered list
	                
	                - there are a couple of ideas that we're playing
with with
	regard to
	                capturing the attribute values from the original
source:
	                
	                a) Use the object tag (with child param tags to
capture the
	        name/value
	                pairs)
	                b) Use a declared namespace to embed the attributes
on the
	element
	                
	                These are, of course, open for discussion. 
	                
	                Jim
	                
	                
	                ================
	                Jim Earley
	                XML Developer/Consultant
	                Flatirons Solutions
	                4747 Table Mesa Drive
	                Boulder, CO 80301
	                
	                Voice: 303.542.2156
	                Fax:   303.544.0522
	                Cell:  303.898.7193
	                
	                Yahoo.IM: jmearley
	                MSN.IM: jearley22@hotmail.com
	                
	                jim.earley@flatironssolutions.com
	                -----Original Message-----
	                From: Dave Pawson [mailto:dave.pawson@gmail.com] 
	                Sent: Tuesday, April 10, 2007 8:12 AM
	                To:
docstandards-interop-discuss@lists.oasis-open.org
	                Subject: Re: [docstandards-interop-discuss]
Clarifications /
	Scope
	        of the
	                intended work?
	                
	                On 10/04/07, Michael Priestley <mpriestl@ca.ibm.com>
wrote:
	                
	                > - govt worker begins drafting a policy note in ODF
with
	the
	        subject
	                "the
	                use of personal data received via email"
	                > - govt worker pulls in the text of the relevant
statute,
	which is
	        in a
	                DITA specialization
	                > - govt worker pulls in the legal disclaimer which
must now
	be
	                included in
	                every government email reply, from a different DITA
	specialization
	                > - govt worker pulls in the instructions on how to
include
	the text
	                of the
	                disclaimer in emails, from documentation of the
email
	software
	        written in
	                DocBook
	                
	                > - technical author 2, using DocBook, creates a
customized
	version
	        of
	                the
	                email software documentation
	                > - and pulls in portions of the procedures web
site, in the
	form of
	        DITA
	                topics and ODF policy notes
	                
	                OK, you've described the problem Michael. I hope we
can all
	        sympathise
	                with that!
	                
	                Ignoring how, what do you see as a solution?
	                
	                A means of 'integrating' n streams?
	                A way of reading n streams?
	                A means of generating .... something readable by
all....
	(lcd
	        solution)
	                
	                What class of solution is the goal please?
	                
	                
	                regards
	                
	                
	                -- 
	                Dave Pawson
	                XSLT XSL-FO FAQ.
	                http://www.dpawson.co.uk <http://www.dpawson.co.uk/>
<http://www.dpawson.co.uk/>
	<http://www.dpawson.co.uk/> 
	                
	                
	        
	
---------------------------------------------------------------------
	                To unsubscribe, e-mail:
	        
	docstandards-interop-discuss-unsubscribe@lists.oasis-open.org
	                For additional commands, e-mail:
	
docstandards-interop-discuss-help@lists.oasis-open.org

BEGIN:VCARD
VERSION:2.1
N:Earley;Jim
FN:Jim Earley
ORG:Flatirons Solutions
TITLE:XML Developer/Consultant
TEL;WORK;VOICE:303.542.2156
TEL;CELL;VOICE:303.898.7193
ADR;WORK:;;4747 Table Mesa Rd, Suite 200;Boulder;CO;80305;United States of America
LABEL;WORK;ENCODING=QUOTED-PRINTABLE:4747 Table Mesa Rd, Suite 200=0D=0ABoulder, CO 80305=0D=0AUnited States of A=
merica
URL;WORK:http://www.flatironssolutions.com
EMAIL;PREF;INTERNET:Jim.Earley@flatironssolutions.com
REV:20060614T132755Z
END:VCARD

smime.p7s

References:
- RE: [docstandards-interop-discuss] Clarifications / Scope of the intended work?
  - From: "David RR Webber \(XML\)" <david@drrw.info>