OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

docstandards-interop-discuss message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [docstandards-interop-discuss] Clarifications / Scope of the intended work?


Jim,
 
If you've ever tried to use FO to create production documents.... ; -)
 
Yes - we do need page semantics. 
 
Government requires paper rules to carry over - so your document can only be 25 pages long; must be formatted for 8.5 x 11 paper - and so on - these are all still legal gotchas.
 
First page must contain title, second page cover letter - and so on.  Then there are things like barcodes, page headers / footers, image scaling / rotation.
 
I speak first hand here - I've just completed managing a 200,000,000 paper page conversion project for NIH to PDF and XML.
 
DW

"The way to be is to do" - Confucius (551-472 B.C.)


-------- Original Message --------
Subject: RE: [docstandards-interop-discuss] Clarifications / Scope of
the intended work?
From: "Earley, Jim" <Jim.Earley@flatironssolutions.com>
Date: Tue, April 10, 2007 1:52 pm
To: "David RR Webber (XML)" <david@drrw.info>
Cc: "Dave Pawson" <dave.pawson@gmail.com>,
<docstandards-interop-discuss@lists.oasis-open.org>

Dave,

Why are we trying to preserve page semantics?  What's wrong with XML DOM?
XSLT/XSL-FO? XQuery?  These are already in the toolchain.  

Jim


================
Jim Earley
XML Developer/Consultant
Flatirons Solutions
4747 Table Mesa Drive
Boulder, CO 80301

Voice: 303.542.2156
Fax:   303.544.0522
Cell:  303.898.7193

Yahoo.IM: jmearley
MSN.IM: jearley22@hotmail.com

jim.earley@flatironssolutions.com
-----Original Message-----
From: David RR Webber (XML) [mailto:david@drrw.info] 
Sent: Tuesday, April 10, 2007 11:46 AM
To: Earley, Jim
Cc: Dave Pawson; docstandards-interop-discuss@lists.oasis-open.org
Subject: RE: [docstandards-interop-discuss] Clarifications / Scope of the
intended work?

Jim,
 
NO NO NO - its not PDF that is the answer!!!
 
Stop thinking PDF please.
 
Yes we know that iText is built for PDF - but that's just the start point
here.
 
What I'm saying is use the model - not the specific rendering.
 
Our XML-script syntax is neutral - will work with ANY document syntax.  
 
It just is that PDF already has one implemented - so they are ahead of
the
game at this point - but should not take long for peoples development
teams
to adapt the iText code base to work for ODF and more as well.
 
So the real abstraction is the in-memory page object model that iText is
using when it runs - just as the DOM is for XML inside a browser....
 
DW

"The way to be is to do" - Confucius (551-472 B.C.)




        -------- Original Message --------
        Subject: RE: [docstandards-interop-discuss] Clarifications / Scope
of
        the intended work?
        From: "Earley, Jim" <Jim.Earley@flatironssolutions.com>
        Date: Tue, April 10, 2007 1:33 pm
        To: "David RR Webber (XML)" <david@drrw.info>
        Cc: "Dave Pawson" <dave.pawson@gmail.com>,
        <docstandards-interop-discuss@lists.oasis-open.org>
        
        
        Dave,
        
        
        Here in lies the rub, me thinks:
        
        >> By creating a standard around the functions and the processing -
we
        establish that "lingua franca" at the level of the processing
required
        - not
        the underlying vendor specific document syntax >> goup - that will
change
        every time they release a new product.
        
        This is why we've proposed going to an abstraction layer to enable
the
        respective standards to keep evolving to meet their constituents'
        needs, yet
        allowing interoperability at the markup level, which I contend is
much
        more
        robust with respect to content reusability, which is what I hear
        repeatedly
        from authors (and not just in the "standard" software Tech Pubs
space).
        
        --
        
        I would argue that going to PDF and then to iText to produce
        interoperability markup, you impose the presentation (and
inherently, the
        presentational structure, which may or may not be equivalent to the
        originating markup structure) on the recipient, lock, stock and
        barrel. DITA
        topics do not necessarily have to begin on a new page, neither do
DocBook
        sections. These are presentatation-specific details that are
        intentionally
        left out of the markup to enable reuse across documents and output
        formats.
        Formatting is fluid based on many different factors: company
branding,
        output format, localization, even audience to name a few.  This is
why I
        believe that separating the presentation from the data is absolutely
        critical.
        
        
        In my experience at a Fortune 50 company that changed their branding
        every
        18-24 months, having the content in structured, semantic markup
saved our
        skins in more ways than you can possibly imagine. We could in a few
weeks
        rebrand thousands of manuals formatted in HTML and PDF in over 9
        languages
        (_because_ the content was in XML) that would otherwise take months
if we
        had to tinker with the formatting. I've been down that road with
        things like
        MS Word or FrameMaker (both structured and unstructured) and rapidly
        run the
        other way when the topic comes up.
        
        What about effectivity (conditional processing) attributes?  What if
I
        create an XML document that contains content embedded for different
        operating systems, each of which is rendered into separate outputs? 
        How do
        I capture these at the presentation layer and then enable authors to
        leverage the content appropriately?  These are things that XML
markup are
        very effective at.
        
        It's also been my experience that authors are embracing XML markup
now
        because the tools support is now readily available from numerous
vendors
        (and they don't have to get their hands dirty with the actual
markup!).
        They are seeing the benefits of working with structured markup with
        respect
        to content reuse and single sourcing. Now their biggest problem is
        pulling
        in content from other sources using different XML standards into
their
        content.  I believe that what we've proposed enables authors to do
this
        without a significant amount of retooling.
        
        Jim
        
        
        
        ================
        Jim Earley
        XML Developer/Consultant
        Flatirons Solutions
        4747 Table Mesa Drive
        Boulder, CO 80301
        
        Voice: 303.542.2156
        Fax:   303.544.0522
        Cell:  303.898.7193
        
        Yahoo.IM: jmearley
        MSN.IM: jearley22@hotmail.com
        
        jim.earley@flatironssolutions.com
        -----Original Message-----
        From: David RR Webber (XML) [mailto:david@drrw.info] 
        Sent: Tuesday, April 10, 2007 10:11 AM
        To: Earley, Jim
        Cc: Dave Pawson; docstandards-interop-discuss@lists.oasis-open.org
        Subject: RE: [docstandards-interop-discuss] Clarifications / Scope
of the
        intended work?
        
        Jim,
         
        I'd argue that you are making my point for me!!!
         
        What we need are FUNCTIONS that match the business requirements you
state
        here.
         
        Your example - "In these cases, the structural and semantic
characterists
        are equally
        important:  a procedure may appear as a numbered list
        presentationally, but
        semantically it is very different than a set of items in a sequenced
        list."
         
        So - if I was using iText to do this - I can handle this both ways -
        either
        get the XML from whereever - and then produce the numbered list (and
        embed
        matching XML metacontent) into PDF - or the reverse - find the
        numbered list
        in the PDF - extract it out - create the XML.
         
        By creating a standard around the functions and the processing - we
        establish that "lingua franca" at the level of the processing
required
        - not
        the underlying vendor specific document syntax goup - that will
change
        every
        time they release a new product.
         
        The vendors then simply provide implementations to our functional
set
        - and
        anyone can then create XML-script handling of their documents -
        inbound or
        outbound - in a consistent way to our specification.
         
        Bottom line is - its the functional handling equivalence we are
        wanting.  
         
        This may ultimately drive syntax alignment - but we do not have to
get
        into
        that ourselves.
         
        DW
        
        "The way to be is to do" - Confucius (551-472 B.C.)
        
        
        
        
                -------- Original Message --------
                Subject: RE: [docstandards-interop-discuss] Clarifications /
Scope
        of
                the intended work?
                From: "Earley, Jim" <Jim.Earley@flatironssolutions.com>
                Date: Tue, April 10, 2007 11:49 am
                To: "David RR Webber (XML)" <david@drrw.info>
                Cc: "Dave Pawson" <dave.pawson@gmail.com>,
                <docstandards-interop-discuss@lists.oasis-open.org>
                
                
                David,
                
                Respectfully, I believe the issue isn't at the presentation
layer
        but
                more
                at the content layer:  How do I leverage/reuse/repurpose
content in
                one XML
                Standard (say DITA) in my content (say DocBook)? Here the
question
        is
                more
                targeted at content interoperability. For example, Vendor A
provides
                content
                to an OEM partner who will rebrand it and integrate Vendor
A's
        content
                into
                their own doc set (could be PDF, HTML, HTML Help, JavaHelp,
or any
                number of
                formats).  Further down the pipeline, the content is reused
in
        Training
                material by a different group using TEI. 
                
                In these cases, the structural and semantic characterists
are
        equally
                important:  a procedure may appear as a numbered list
                presentationally, but
                semantically it is very different than a set of items in a
sequenced
                list.
                
                By abstracting each XML standard's specific content models
to a
        common
                denominator, you can preserve structure along with semantics
in a
        way
                that
                enables other XML standards to leverage the content using
their
                grammar with
                minimal loss to semantics from the original.
                
                Certainly, there are cases as you mentioned that require the
                presentational
                functionality to be preserved "as submitted" that do not
apply here.
                And in
                these cases, your approach to maintaining the presentational
        semantics is
                very interesting. I've used iText for personal projects, and
yes, it
                is very
                mature. 
                
                Cheers,
                
                Jim
                
                ================
                Jim Earley
                XML Developer/Consultant
                Flatirons Solutions
                4747 Table Mesa Drive
                Boulder, CO 80301
                
                Voice: 303.542.2156
                Fax:   303.544.0522
                Cell:  303.898.7193
                
                Yahoo.IM: jmearley
                MSN.IM: jearley22@hotmail.com
                
                jim.earley@flatironssolutions.com
                -----Original Message-----
                From: David RR Webber (XML) [mailto:david@drrw.info] 
                Sent: Tuesday, April 10, 2007 9:02 AM
                To: Earley, Jim
                Cc: Dave Pawson;
docstandards-interop-discuss@lists.oasis-open.org
                Subject: RE: [docstandards-interop-discuss] Clarifications /
Scope
        of the
                intended work?
                
                Jim,
                 
                Why not focus on the handling functions instead?  That way
you are
        an
                abstraction layer above the lowlevel representation syntax.

                 
                The xhtml is problematic - especially when it comes to page
counts
        and
                page
                content.  Legally also - you need to leave things "as
submitted" -
                because
                you may reject a submission as say not having content in the
right
                place on
                a page, or total pages - and yet the original was OK when
viewed in
        the
                native format.
                 
                Also - by going with functions - you put the onus on the
individual
        tool
                vendors to support those functions consistently - without
having to
                get into
                the lower level syntax ourselves of how that occurs, either
now or
        future
                new formats.
                 
                At the end of the day it is the BUSINESS FUNCTIONALITY that
you want
                interoperability around - not the raw document.
                 
                So from the business stance - if I need to check for certain
        bookmarks,
                sections, text strings, page counts, word counts, etc - I
can do
        that.
                 
                DW
                
                "The way to be is to do" - Confucius (551-472 B.C.)
                
                
                
                
                        -------- Original Message --------
                        Subject: RE: [docstandards-interop-discuss]
Clarifications /
        Scope
                of
                        the intended work?
                        From: "Earley, Jim"
<Jim.Earley@flatironssolutions.com>
                        Date: Tue, April 10, 2007 10:46 am
                        To: "Dave Pawson" <dave.pawson@gmail.com>,
                        <docstandards-interop-discuss@lists.oasis-open.org>
                        
                        
                        Dave,
                        
                        The current thinking with regard to a solution uses
XHTML
                Microformats as
                        the abstraction layer. All of the standards (DITA,
DB, ODF)
        share
                the
                        same
                        structural characteristics (Headings, paragraphs,
lists,
        tables,
                images,
                        etc.) albeit in different ways. 
                        
                        The premise thus far is: 
                        
                        1. use standard XHTML markup for common
semantic/structural
                components
                        (table, img, p, ol, acronym, strong, em, etc)
                        2. For structural components that do not have an
equivalent
        XHTML
                        mapping,
                        use <div>
                        3. For inline semantics that do not have an
equivalent XHTML
                mapping, use
                        <span>
                        
                        - use the title attribute (available on any XHTML
element)
        to store
                the
                        original element name
                        - use the class attribute to store the "semantic
category":
        e.g.,
                        "procedural" vs. "list" to delineate between a
procedural
        set of
                steps
                        compared to a numbered list
                        
                        - there are a couple of ideas that we're playing
with with
        regard to
                        capturing the attribute values from the original
source:
                        
                        a) Use the object tag (with child param tags to
capture the
                name/value
                        pairs)
                        b) Use a declared namespace to embed the attributes
on the
        element
                        
                        These are, of course, open for discussion. 
                        
                        Jim
                        
                        
                        ================
                        Jim Earley
                        XML Developer/Consultant
                        Flatirons Solutions
                        4747 Table Mesa Drive
                        Boulder, CO 80301
                        
                        Voice: 303.542.2156
                        Fax:   303.544.0522
                        Cell:  303.898.7193
                        
                        Yahoo.IM: jmearley
                        MSN.IM: jearley22@hotmail.com
                        
                        jim.earley@flatironssolutions.com
                        -----Original Message-----
                        From: Dave Pawson [mailto:dave.pawson@gmail.com] 
                        Sent: Tuesday, April 10, 2007 8:12 AM
                        To:
docstandards-interop-discuss@lists.oasis-open.org
                        Subject: Re: [docstandards-interop-discuss]
Clarifications /
        Scope
                of the
                        intended work?
                        
                        On 10/04/07, Michael Priestley <mpriestl@ca.ibm.com>
wrote:
                        
                        > - govt worker begins drafting a policy note in ODF
with
        the
                subject
                        "the
                        use of personal data received via email"
                        > - govt worker pulls in the text of the relevant
statute,
        which is
                in a
                        DITA specialization
                        > - govt worker pulls in the legal disclaimer which
must now
        be
                        included in
                        every government email reply, from a different DITA
        specialization
                        > - govt worker pulls in the instructions on how to
include
        the text
                        of the
                        disclaimer in emails, from documentation of the
email
        software
                written in
                        DocBook
                        
                        > - technical author 2, using DocBook, creates a
customized
        version
                of
                        the
                        email software documentation
                        > - and pulls in portions of the procedures web
site, in the
        form of
                DITA
                        topics and ODF policy notes
                        
                        OK, you've described the problem Michael. I hope we
can all
                sympathise
                        with that!
                        
                        Ignoring how, what do you see as a solution?
                        
                        A means of 'integrating' n streams?
                        A way of reading n streams?
                        A means of generating .... something readable by
all....
        (lcd
                solution)
                        
                        What class of solution is the goal please?
                        
                        
                        regards
                        
                        
                        -- 
                        Dave Pawson
                        XSLT XSL-FO FAQ.
                        http://www.dpawson.co.uk <http://www.dpawson.co.uk/>
<http://www.dpawson.co.uk/>
        <http://www.dpawson.co.uk/> 
                        
                        
                
        
---------------------------------------------------------------------
                        To unsubscribe, e-mail:
                
        docstandards-interop-discuss-unsubscribe@lists.oasis-open.org
                        For additional commands, e-mail:
        
docstandards-interop-discuss-help@lists.oasis-open.org
                        
                
        



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]