docstandards-interop-discuss message
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]
Subject: RE: [docstandards-interop-discuss] Clarifications / Scope of the intendedwork?
- From: Michael Priestley <mpriestl@ca.ibm.com>
- To: "David RR Webber \(XML\)" <david@drrw.info>
- Date: Tue, 10 Apr 2007 10:57:47 -0400
If we just used a presentation format
for interchange, how would you preserve semantics, and how would you get
a different look and feel?
For example, if I pull a DocBook procedure
into a DITA web, I'd like to be able to:
a) identify it as an equivalent
to a DITA task, so I can do things like sort related links appropriately,
and
b) apply my own look-and-feel, including
fonts, generated headings, headers/footers/standard navigation elements,
etc.
There are several degrees of interoperability:
1) sharing content: pull the content
into my deliverable, applying my own look and feel, navigation etc. - this
is relatively simple, but already requires more than PDF as source
2) sharing semantics: pull the content
into my production system, including specialized semantic processing for
specialized elements - like treating task steps in a different way from
generic list items
3) sharing constraints: provide equivalent
constraints on both sides of the interchange, so that you can get robust
integration of processes, and not break down every time someone feeds you
a supposed "DITA task" that breaks the processing expectations
by e.g. allowing multiple lists of steps under a single title, or more
than one level of step nesting.
One of the proposals currently in place,
including an argument for using an XML hub format for interchange with
preservation of semantics, is here:
http://flatironssolutions.com/Downloads/DITA2007West.pdf
- it provides a potential solution for 1) and 2); for 3), DITA has mechanisms
for creating specialized content types that can match other existing standards
while still processing as DITA content, which gives a potential solution
for some cases at least.
Michael Priestley
IBM DITA Architect and Classification Schema PDT Lead
mpriestl@ca.ibm.com
http://dita.xml.org/blog/25
"David RR Webber \(XML\)"
<david@drrw.info>
04/10/2007 10:41 AM
|
To
| Michael Priestley/Toronto/IBM@IBMCA
|
cc
| docstandards-interop-discuss@lists.oasis-open.org
|
Subject
| RE: [docstandards-interop-discuss] Clarifications
/ Scope of the intended work? |
|
Michael,
OK - then I believe the focus should be one level up.
I'd postulate that content sharing has to be able to support document
formats in a neutral way - a framework - rather than dictating one uber
format or specific format - and then requiring transformation. From
the human/business perspective - so long as the content can be presented
consistently for human viewing / searching - the underlaying machine level
stuff is immaterial.
What I had been talking to Adobe about is creating XML
scripting for handling PDF attachments. Now PDF is an ISO submission
- this opens up the way for that here.
The use case is from eGov - and the PDF is processed in
several ways:
1) Checked to be valid PDF
- there's 100's of "flavours" of
PDF - so check that its one you allow - e.g. reject if locked, not printable,
editable, embedded graphics, wrong page size, no signature, wrong type
of embedded notes, etc
- make sure its not corrupted and CRC etc
OK.
2) Check PDF for content required items
- simple text headings and other content
- required bookmarks and links OK
- if using embedded XML for metacontent
- make sure those are there
- graphics items
- page counts - total pages
3) Post-processing
- text extraction for knowledge mining
- re-packaging for review - combining with
bookmarks, ToC, adding review pages, etc.
- add or remove XML metacontent, notes,
other flags
- re-size and rotate graphics and content
pages to make them standard orientation and sizes
Attached is a sample of this XML.
While all this is specific to PDF - and targetted at the
iText OSS implementation initially - given that you can create the "iText"
functional toolset to work against any target document format - Word, ODF,
etc - I would suggest therefore that it would make sense to have the framework
be there items:
1) Guidelines for document exchange - provides means to
capture the who and the what - MoU / CPA level agreements
- can be both XML layout and / or document
template.
2) Formal ability to express scripts that describes the
content items, validations and checks and re-packaging occurring:
- sample for XML scripting to drive PDF receipt
processing
- reverse scripting - template for generating
document that will be filled in.
3) Formal set of document handling primatives to work
with 2) that can be implemented for various document formats
- iText library good starting point for creating
function set
- function set would be only a subset of
these functions - aimed at exchange use case only
What this does therefore is allow exchanges to occur in
a variety of document formats, both now, and into the future - but provides
a common means to handle these, build them, and fill them in - regardless
of the underlaying syntax of the documents themselves.
Now of course this is a MUCH bigger elephant! How
much work does the TC want to chew off?
Conversely - you could view it the other way around -
the PDF / XML approach is "low hanging fruit" - the OSS implementation
exists with a large and active community - providing the XML handler there
would be quick - and an implementation to support it simple.
Once that PDF use case is in place - then extend it out
to ODF and Word next....by implementing the iText functional set for those
formats too. This would then enable the third piece of course - transformation
- by proxy! I could open a PDF in iText - call the ODF java functions
to save it to ODF - but then that getting ahead of ourselves....
Thanks, DW
"The way to be is to do" - Confucius (551-472 B.C.)
-------- Original Message --------
Specifically we want to formalize mechanisms for exchanging content between
organizations or applications that are using different XML document standards
- so not PDF per se, but ODF, DITA, and DocBook, for a start, and hopefully
others as we progress.
pdfGenXML-sample.xml
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]