OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

docstandards-interop-discuss message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [docstandards-interop-discuss] Clarifications / Scope of the intended work?


Michael,
 
OK - then I believe the focus should be one level up.  I'd postulate that content sharing has to be able to support document formats in a neutral way - a framework - rather than dictating one uber format or specific format - and then requiring transformation.  From the human/business perspective - so long as the content can be presented consistently for human viewing / searching - the underlaying machine level stuff is immaterial.
 
What I had been talking to Adobe about is creating XML scripting for handling PDF attachments.  Now PDF is an ISO submission - this opens up the way for that here.
 
The use case is from eGov - and the PDF is processed in several ways:
 
1) Checked to be valid PDF
   - there's 100's of "flavours" of PDF - so check that its one you allow - e.g. reject if locked, not printable, editable, embedded graphics, wrong page size, no signature, wrong type of embedded notes, etc
   - make sure its not corrupted and CRC etc OK.
 
2) Check PDF for content required items
    - simple text headings and other content
    - required bookmarks and links OK
    - if using embedded XML for metacontent - make sure those are there
    - graphics items
    - page counts - total pages
 
3) Post-processing
    - text extraction for knowledge mining
    - re-packaging for review - combining with bookmarks, ToC, adding review pages, etc.
    - add or remove XML metacontent, notes, other flags
    - re-size and rotate graphics and content pages to make them standard orientation and sizes
 
Attached is a sample of this XML.
 
While all this is specific to PDF - and targetted at the iText OSS implementation initially - given that you can create the "iText" functional toolset to work against any target document format - Word, ODF, etc - I would suggest therefore that it would make sense to have the framework be there items:
 
1) Guidelines for document exchange - provides means to capture the who and the what - MoU / CPA level agreements
    - can be both XML layout and / or document template.
 
2) Formal ability to express scripts that describes the content items, validations and checks and re-packaging occurring:
   - sample for XML scripting to drive PDF receipt processing
   - reverse scripting - template for generating document that will be filled in.
 
3) Formal set of document handling primatives to work with 2) that can be implemented for various document formats
   - iText library good starting point for creating function set
   - function set would be only a subset of these functions - aimed at exchange use case only
What this does therefore is allow exchanges to occur in a variety of document formats, both now, and into the future - but provides a common means to handle these, build them, and fill them in - regardless of the underlaying syntax of the documents themselves.
 
Now of course this is a MUCH bigger elephant!  How much work does the TC want to chew off?
 
Conversely - you could view it the other way around - the PDF / XML approach is "low hanging fruit" - the OSS implementation exists with a large and active community - providing the XML handler there would be quick - and an implementation to support it simple.
 
Once that PDF use case is in place - then extend it out to ODF and Word next....by implementing the iText functional set for those formats too.  This would then enable the third piece of course - transformation - by proxy!  I could open a PDF in iText - call the ODF java functions to save it to ODF - but then that getting ahead of ourselves....
 
Thanks, DW

"The way to be is to do" - Confucius (551-472 B.C.)


-------- Original Message --------

Specifically we want to formalize mechanisms for exchanging content between organizations or applications that are using different XML document standards - so not PDF per se, but ODF, DITA, and DocBook, for a start, and hopefully others as we progress.

<?xml version="1.0" encoding="UTF-8"?>
<pdfGenXML xmlns:xmp="http://www.adobe.com/xmp";>
  <pdfHeader>
    <!-- This allows setting of various properties for the PDF document -->
    <pdfSettings>
     <pdfSet property="PageSize" value="8.5x11"/>
     <pdfSet property="DPI" value="72"/>
    </pdfSettings>
    <!-- Also XMP metadata tags -->
   <pdfXMP>
      <xmp:Stuff/>
   </pdfXMP>
  </pdfHeader>
  <pdfContent sourceURL='c:\samples\content\docs1'>
     <pdfDefaults>
       <pdfPgHdr lines="2" text="A sample generated PDF //@date()//" align="middle"/>
       <pdfPgFtr lines="2" text="Copyright //@char('#1234')// OASIS pdfGen TC - Page //@page()//" align="left"/>
     </pdfDefaults>
     <pdfPage>
        <pdfSuppress Hdr="true" Ftr="true"/>
        <pdfText syntax="HTML" font="Times Roman" size="3">
          <br/><h1>Our Sample Document</h1>
          <br/><br/><h3>Generated using pdfGen XML scripting</h3>
        </pdfText>
        <pdfBarCode style="3of9" rotated="no" position="10,5" startvalue="120055544"/>
     </pdfPage>
     <pdfPage>
       <pdfTOC style="default">
         <pdfBookMark name="Chapter 1"/>
         <pdfBookMark name="Chapter 2"/>
         <pdfBookMark name="Chapter 3"/>
         <pdfBookMark name="Chapter 4"/>
       </pdfTOC>
     </pdfPage>
     <pdfPage>
         <pdfSetBookMark name="Chapter 1"/>
         <pdfInsert type="PDF" sourceDOC='..\mydoc1.pdf' scaleContent="false"/>
         <pdfValidation>
           <pdfCheck condition="pageCount" max="1" severity="warn">WARNING: //@sourceDOC() page count more than one.</pdfCheck>
           <pdfCheck condition="required" severity="error">ERROR: //@sourceDOC() missing.</pdfCheck>
         </pdfValidation>
     </pdfPage>
     <pdfPage>
         <pdfSetBookMark name="Chapter 2"/>
          <pdfText space="preserve" syntax="text" font="Times Roman" size="3">
  Sample Image of Blocked Pipe

          </pdfText>
         <pdfInsert type="JPG" sourceDOC="..\mypic1.jpg" scaleContent="fitToPage"/>
     </pdfPage>
     <pdfPage>
         <pdfInsert type="PDF" sourceDOC="..\mydoc2.pdf" scaleContent="false" editable="flatten" preserveNotes="true"/>
     </pdfPage>
     <pdfPage>
         <pdfSetBookMark name="Chapter 3"/>
         <pdfInsert type="PDF" sourceDOC="..\mydoc3.pdf" scaleContent="adjustPageSize" landscape="rotate"/>
         <pdfValidation>
           <pdfCheck condition="contains" value="Introduction to PDF handling">WARNING: //@sourceDOC() missing topic - 'Introduction to PDF handling'.</pdfCheck>
           <pdfCheck condition="required" severity="error">ERROR: //@sourceDOC() missing.</pdfCheck>
         </pdfValidation>
     </pdfPage>
     <pdfPage>
         <pdfSetBookMark name="Chapter 4"/>
         <pdfInsert type="XFO" sourceDOC="..\mydoc3.xml" stylesheet="..\page-layout.xsl"/>
     </pdfPage>
  </pdfContent>
  <pdfOnError>
    <pdfIfError>
     <pdfPage>
        <pdfText space="preserve" syntax="HTML" font="Times Roman" size="3">
          <br/><h1>ERROR OCCURRED:</h1>
          <br/><br/><h3>Generation failed - reason:</h3><br/>
          &lt;pre&gt;
        </pdfText>
        <pdfErrorText/>
        <pdfText space="preserve" syntax="HTML" font="Times Roman" size="3">
          &lt;/pre&gt;
        </pdfText>
     </pdfPage>
    </pdfIfError>
     <pdfReport method="REST" targetURL="my.webservice.com:8044/catchit" syntax="http">
       <pdfPage>
        <pdfIfError>
         <pdfText space="preserve" syntax="HTML" font="Times Roman" size="3">
          <br/><h1>ERROR OCCURRED:</h1>
          <br/><br/><h3>Generation failed - reason:</h3><br/>
          &lt;pre&gt;
         </pdfText>
        </pdfIfError>

        <pdfIfWarn>
         <pdfText space="preserve" syntax="HTML" font="Times Roman" size="3">
          <br/><h1>Warning:</h1>
          <br/><br/><h3>Invalid content - reason:</h3><br/>
          &lt;pre&gt;
         </pdfText>
        </pdfIfWarn>

        <pdfErrorText/>
        <pdfText space="preserve" syntax="HTML" font="Times Roman" size="3">
          &lt;/pre&gt;
        </pdfText>
       </pdfPage>
     </pdfReport>
  </pdfOnError>
</pdfGenXML>


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]