docstandards-interop-discuss message

Subject: RE: [docstandards-interop-discuss] Clarifications / Scope of the intended work?

From: "Earley, Jim" <Jim.Earley@flatironssolutions.com>
To: "David RR Webber \(XML\)" <david@drrw.info>
Date: Tue, 10 Apr 2007 11:33:26 -0600

Dave,


Here in lies the rub, me thinks:

>> By creating a standard around the functions and the processing - we
establish that "lingua franca" at the level of the processing required - not
the underlying vendor specific document syntax >> goup - that will change
every time they release a new product.

This is why we've proposed going to an abstraction layer to enable the
respective standards to keep evolving to meet their constituents' needs, yet
allowing interoperability at the markup level, which I contend is much more
robust with respect to content reusability, which is what I hear repeatedly
from authors (and not just in the "standard" software Tech Pubs space).

--

I would argue that going to PDF and then to iText to produce
interoperability markup, you impose the presentation (and inherently, the
presentational structure, which may or may not be equivalent to the
originating markup structure) on the recipient, lock, stock and barrel. DITA
topics do not necessarily have to begin on a new page, neither do DocBook
sections. These are presentatation-specific details that are intentionally
left out of the markup to enable reuse across documents and output formats.
Formatting is fluid based on many different factors: company branding,
output format, localization, even audience to name a few.  This is why I
believe that separating the presentation from the data is absolutely
critical.


In my experience at a Fortune 50 company that changed their branding every
18-24 months, having the content in structured, semantic markup saved our
skins in more ways than you can possibly imagine. We could in a few weeks
rebrand thousands of manuals formatted in HTML and PDF in over 9 languages
(_because_ the content was in XML) that would otherwise take months if we
had to tinker with the formatting. I've been down that road with things like
MS Word or FrameMaker (both structured and unstructured) and rapidly run the
other way when the topic comes up.

What about effectivity (conditional processing) attributes?  What if I
create an XML document that contains content embedded for different
operating systems, each of which is rendered into separate outputs?  How do
I capture these at the presentation layer and then enable authors to
leverage the content appropriately?  These are things that XML markup are
very effective at.

It's also been my experience that authors are embracing XML markup now
because the tools support is now readily available from numerous vendors
(and they don't have to get their hands dirty with the actual markup!).
They are seeing the benefits of working with structured markup with respect
to content reuse and single sourcing. Now their biggest problem is pulling
in content from other sources using different XML standards into their
content.  I believe that what we've proposed enables authors to do this
without a significant amount of retooling.

Jim



================
Jim Earley
XML Developer/Consultant
Flatirons Solutions
4747 Table Mesa Drive
Boulder, CO 80301

Voice: 303.542.2156
Fax:   303.544.0522
Cell:  303.898.7193

Yahoo.IM: jmearley
MSN.IM: jearley22@hotmail.com

jim.earley@flatironssolutions.com
-----Original Message-----
From: David RR Webber (XML) [mailto:david@drrw.info] 
Sent: Tuesday, April 10, 2007 10:11 AM
To: Earley, Jim
Cc: Dave Pawson; docstandards-interop-discuss@lists.oasis-open.org
Subject: RE: [docstandards-interop-discuss] Clarifications / Scope of the
intended work?

Jim,
 
I'd argue that you are making my point for me!!!
 
What we need are FUNCTIONS that match the business requirements you state
here.
 
Your example - "In these cases, the structural and semantic characterists
are equally
important:  a procedure may appear as a numbered list presentationally, but
semantically it is very different than a set of items in a sequenced
list."
 
So - if I was using iText to do this - I can handle this both ways - either
get the XML from whereever - and then produce the numbered list (and embed
matching XML metacontent) into PDF - or the reverse - find the numbered list
in the PDF - extract it out - create the XML.
 
By creating a standard around the functions and the processing - we
establish that "lingua franca" at the level of the processing required - not
the underlying vendor specific document syntax goup - that will change every
time they release a new product.
 
The vendors then simply provide implementations to our functional set - and
anyone can then create XML-script handling of their documents - inbound or
outbound - in a consistent way to our specification.
 
Bottom line is - its the functional handling equivalence we are wanting.  
 
This may ultimately drive syntax alignment - but we do not have to get into
that ourselves.
 
DW

"The way to be is to do" - Confucius (551-472 B.C.)




	-------- Original Message --------
	Subject: RE: [docstandards-interop-discuss] Clarifications / Scope
of
	the intended work?
	From: "Earley, Jim" <Jim.Earley@flatironssolutions.com>
	Date: Tue, April 10, 2007 11:49 am
	To: "David RR Webber (XML)" <david@drrw.info>
	Cc: "Dave Pawson" <dave.pawson@gmail.com>,
	<docstandards-interop-discuss@lists.oasis-open.org>
	
	
	David,
	
	Respectfully, I believe the issue isn't at the presentation layer
but
	more
	at the content layer:  How do I leverage/reuse/repurpose content in
	one XML
	Standard (say DITA) in my content (say DocBook)? Here the question
is
	more
	targeted at content interoperability. For example, Vendor A provides
	content
	to an OEM partner who will rebrand it and integrate Vendor A's
content
	into
	their own doc set (could be PDF, HTML, HTML Help, JavaHelp, or any
	number of
	formats).  Further down the pipeline, the content is reused in
Training
	material by a different group using TEI. 
	
	In these cases, the structural and semantic characterists are
equally
	important:  a procedure may appear as a numbered list
	presentationally, but
	semantically it is very different than a set of items in a sequenced
	list.
	
	By abstracting each XML standard's specific content models to a
common
	denominator, you can preserve structure along with semantics in a
way
	that
	enables other XML standards to leverage the content using their
	grammar with
	minimal loss to semantics from the original.
	
	Certainly, there are cases as you mentioned that require the
	presentational
	functionality to be preserved "as submitted" that do not apply here.
	And in
	these cases, your approach to maintaining the presentational
semantics is
	very interesting. I've used iText for personal projects, and yes, it
	is very
	mature. 
	
	Cheers,
	
	Jim
	
	================
	Jim Earley
	XML Developer/Consultant
	Flatirons Solutions
	4747 Table Mesa Drive
	Boulder, CO 80301
	
	Voice: 303.542.2156
	Fax:   303.544.0522
	Cell:  303.898.7193
	
	Yahoo.IM: jmearley
	MSN.IM: jearley22@hotmail.com
	
	jim.earley@flatironssolutions.com
	-----Original Message-----
	From: David RR Webber (XML) [mailto:david@drrw.info] 
	Sent: Tuesday, April 10, 2007 9:02 AM
	To: Earley, Jim
	Cc: Dave Pawson; docstandards-interop-discuss@lists.oasis-open.org
	Subject: RE: [docstandards-interop-discuss] Clarifications / Scope
of the
	intended work?
	
	Jim,
	 
	Why not focus on the handling functions instead?  That way you are
an
	abstraction layer above the lowlevel representation syntax.  
	 
	The xhtml is problematic - especially when it comes to page counts
and
	page
	content.  Legally also - you need to leave things "as submitted" -
	because
	you may reject a submission as say not having content in the right
	place on
	a page, or total pages - and yet the original was OK when viewed in
the
	native format.
	 
	Also - by going with functions - you put the onus on the individual
tool
	vendors to support those functions consistently - without having to
	get into
	the lower level syntax ourselves of how that occurs, either now or
future
	new formats.
	 
	At the end of the day it is the BUSINESS FUNCTIONALITY that you want
	interoperability around - not the raw document.
	 
	So from the business stance - if I need to check for certain
bookmarks,
	sections, text strings, page counts, word counts, etc - I can do
that.
	 
	DW
	
	"The way to be is to do" - Confucius (551-472 B.C.)
	
	
	
	
	        -------- Original Message --------
	        Subject: RE: [docstandards-interop-discuss] Clarifications /
Scope
	of
	        the intended work?
	        From: "Earley, Jim" <Jim.Earley@flatironssolutions.com>
	        Date: Tue, April 10, 2007 10:46 am
	        To: "Dave Pawson" <dave.pawson@gmail.com>,
	        <docstandards-interop-discuss@lists.oasis-open.org>
	        
	        
	        Dave,
	        
	        The current thinking with regard to a solution uses XHTML
	Microformats as
	        the abstraction layer. All of the standards (DITA, DB, ODF)
share
	the
	        same
	        structural characteristics (Headings, paragraphs, lists,
tables,
	images,
	        etc.) albeit in different ways. 
	        
	        The premise thus far is: 
	        
	        1. use standard XHTML markup for common semantic/structural
	components
	        (table, img, p, ol, acronym, strong, em, etc)
	        2. For structural components that do not have an equivalent
XHTML
	        mapping,
	        use <div>
	        3. For inline semantics that do not have an equivalent XHTML
	mapping, use
	        <span>
	        
	        - use the title attribute (available on any XHTML element)
to store
	the
	        original element name
	        - use the class attribute to store the "semantic category":
e.g.,
	        "procedural" vs. "list" to delineate between a procedural
set of
	steps
	        compared to a numbered list
	        
	        - there are a couple of ideas that we're playing with with
regard to
	        capturing the attribute values from the original source:
	        
	        a) Use the object tag (with child param tags to capture the
	name/value
	        pairs)
	        b) Use a declared namespace to embed the attributes on the
element
	        
	        These are, of course, open for discussion. 
	        
	        Jim
	        
	        
	        ================
	        Jim Earley
	        XML Developer/Consultant
	        Flatirons Solutions
	        4747 Table Mesa Drive
	        Boulder, CO 80301
	        
	        Voice: 303.542.2156
	        Fax:   303.544.0522
	        Cell:  303.898.7193
	        
	        Yahoo.IM: jmearley
	        MSN.IM: jearley22@hotmail.com
	        
	        jim.earley@flatironssolutions.com
	        -----Original Message-----
	        From: Dave Pawson [mailto:dave.pawson@gmail.com] 
	        Sent: Tuesday, April 10, 2007 8:12 AM
	        To: docstandards-interop-discuss@lists.oasis-open.org
	        Subject: Re: [docstandards-interop-discuss] Clarifications /
Scope
	of the
	        intended work?
	        
	        On 10/04/07, Michael Priestley <mpriestl@ca.ibm.com> wrote:
	        
	        > - govt worker begins drafting a policy note in ODF with
the
	subject
	        "the
	        use of personal data received via email"
	        > - govt worker pulls in the text of the relevant statute,
which is
	in a
	        DITA specialization
	        > - govt worker pulls in the legal disclaimer which must now
be
	        included in
	        every government email reply, from a different DITA
specialization
	        > - govt worker pulls in the instructions on how to include
the text
	        of the
	        disclaimer in emails, from documentation of the email
software
	written in
	        DocBook
	        
	        > - technical author 2, using DocBook, creates a customized
version
	of
	        the
	        email software documentation
	        > - and pulls in portions of the procedures web site, in the
form of
	DITA
	        topics and ODF policy notes
	        
	        OK, you've described the problem Michael. I hope we can all
	sympathise
	        with that!
	        
	        Ignoring how, what do you see as a solution?
	        
	        A means of 'integrating' n streams?
	        A way of reading n streams?
	        A means of generating .... something readable by all....
(lcd
	solution)
	        
	        What class of solution is the goal please?
	        
	        
	        regards
	        
	        
	        -- 
	        Dave Pawson
	        XSLT XSL-FO FAQ.
	        http://www.dpawson.co.uk <http://www.dpawson.co.uk/>
<http://www.dpawson.co.uk/> 
	        
	        
	
---------------------------------------------------------------------
	        To unsubscribe, e-mail:
	
docstandards-interop-discuss-unsubscribe@lists.oasis-open.org
	        For additional commands, e-mail:
	        docstandards-interop-discuss-help@lists.oasis-open.org

BEGIN:VCARD
VERSION:2.1
N:Earley;Jim
FN:Jim Earley
ORG:Flatirons Solutions
TITLE:XML Developer/Consultant
TEL;WORK;VOICE:303.542.2156
TEL;CELL;VOICE:303.898.7193
ADR;WORK:;;4747 Table Mesa Rd, Suite 200;Boulder;CO;80305;United States of America
LABEL;WORK;ENCODING=QUOTED-PRINTABLE:4747 Table Mesa Rd, Suite 200=0D=0ABoulder, CO 80305=0D=0AUnited States of A=
merica
URL;WORK:http://www.flatironssolutions.com
EMAIL;PREF;INTERNET:Jim.Earley@flatironssolutions.com
REV:20060614T132755Z
END:VCARD

smime.p7s

References:
- RE: [docstandards-interop-discuss] Clarifications / Scope of the intended work?
  - From: "David RR Webber \(XML\)" <david@drrw.info>