docstandards-interop-discuss message
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]
Subject: RE: [docstandards-interop-discuss] Clarifications / Scope of the intendedwork?
- From: Michael Priestley <mpriestl@ca.ibm.com>
- To: "David RR Webber \(XML\)" <david@drrw.info>
- Date: Tue, 10 Apr 2007 11:20:29 -0400
Hi David,
So effectively what you're advocating
is a hub format interchange, just like the Flatirons proposal, but with
PDF with embedded XML metatagging, instead of XHTML with extra values in
existing attributes.
What are the advantages of using PDF
over XHTML?
Michael Priestley
IBM DITA Architect and Classification Schema PDT Lead
mpriestl@ca.ibm.com
http://dita.xml.org/blog/25
"David RR Webber \(XML\)"
<david@drrw.info>
04/10/2007 11:10 AM
|
To
| Michael Priestley/Toronto/IBM@IBMCA
|
cc
| docstandards-interop-discuss@lists.oasis-open.org
|
Subject
| RE: [docstandards-interop-discuss] Clarifications
/ Scope of the intended work? |
|
Michael,
OK - I believe the function library approach easily handles
the use case here. There's two XML-scripts - one is for new content
assembly - your use case below - and the other is document post-processing
and validation - my use case.
So - in your use case - repeated repurposing of content
- the function library allows you to identify the content items - regardless
of source (PDF, ODF, et al).
Here's how I'd see this working:
Step 1 - receive new content item / or retrieve content
from repository
Step 2 - run validation script to verify that it is OK
- has parts needed
Step 3 - run content creation script - extracts out parts
from doc's - then applies new layoutting etc.
BTW - iText has all that "new layout and embedding"
stuff in spades too - way more than is in the original PDF document - as
you indicate - it is trivial to embed XML as well for DITA metatagging
and so on into PDF doc's that you generate. No surprises there -
PDF is an extremely rich and mature syntax. You can cram as much
DITA as you like into a PDF using the meta XML support it has!
DW
"The way to be is to do" - Confucius (551-472 B.C.)
-------- Original Message --------
Subject: RE: [docstandards-interop-discuss] Clarifications / Scope of
the intended work?
From: Michael Priestley <mpriestl@ca.ibm.com>
Date: Tue, April 10, 2007 10:57 am
To: "David RR Webber (XML)" <david@drrw.info>
Cc: docstandards-interop-discuss@lists.oasis-open.org
If we just used a presentation format for interchange, how would you preserve
semantics, and how would you get a different look and feel?
For example, if I pull a DocBook procedure into a DITA web, I'd like to
be able to:
a) identify it as an equivalent to a DITA task, so I can do things like
sort related links appropriately, and
b) apply my own look-and-feel, including fonts, generated headings,! headers/footers/standard
navigation elements, etc.
There are several degrees of interoperability:
1) sharing content: pull the content into my deliverable, applying my own
look and feel, navigation etc. - this is relatively simple, but already
requires more than PDF as source
2) sharing semantics: pull the content into my production system, including
specialized semantic processing for specialized elements - like treating
task steps in a different way from generic list items
3) sharing constraints: provide equivalent constraints on both sides of
the interchange, so that you can get robust integration of processes, and
not break down every time someone feeds you a supposed "DITA task"
that breaks the processing expectations by e.g. allowing multiple lists
of steps under a single title, or more than one level of s! tep nesting.
One of the pr oposals currently in place, including an argument for using
an XML hub format for interchange with preservation of semantics, is here:
http://flatironssolutions.com/Downloads/DITA2007West.pdf - it provides
a potential solution for 1) and 2); for 3), DITA has mechanisms for creating
specialized content types that can match other existing standards while
still processing as DITA content, which gives a potential solution for
some cases at least.
Michael Priestley
IBM DITA Architect and Classification Schema PDT Lead
mpriestl@ca.ibm.com
http://dita.xml.org/blog/25
"David RR Webber \(XML\)"
<david@drrw.info>
04/10/2007 10:41 AM
|
To
| Michael Priestley/Toronto/IBM@IBMCA
|
cc
| docstandards-interop-discuss@lists.oasis-open.org
|
Subject
| RE: [docstandards-interop-discuss] Clarifications
/ Scope of the intended work? |
|
Michael,
OK - then I believe the focus should be one level up. I'd postulate
that content sharing has to be able to support document formats in a neutral
way - a framework - rather than dictating one uber format or specific format
- and then requiring transformation. From the human/business perspective
- so long as the content can be presented consistently for human viewing
/ searching - the underlaying machine level stuff is immaterial.
What I had been talking to Adobe about is creating XML scripting for handling
PDF attachments. Now PDF is an ISO submission - this opens up the
way for that here.
The use case is from eGov - and the PDF is processed in several ways:
1! ) Checked to be valid PDF
- there's 100's of "flavours" of PDF - so check that
its one you allow - e.g. reject if locked, not printable, editable, embedded
graphics, wrong page size, no signature, wrong type of embedded notes,
etc
- make sure its not corrupted and CRC etc OK.
2) Check PDF for content required items
- simple text headings and other content
- required bookmarks and links OK
- if using embedded XML for metacontent - make sure those
are there
- graphics items
- page counts - total pages
3) Post-processing
- text extraction for knowledge mining
- re-packaging for review - combining with bo okmarks, ToC,
adding review pages, etc.
- add or remove XML metacontent, notes, other flags
- re-size and rotate graphics and content pages to make them
standard orientation and sizes
Attached is a sample of this XML.
While all this is specific to PDF - and targetted at the iText OSS implementation
initially - given that you can create the "iText" functional
toolset to work against any target document format - Word, ODF, etc - I
would suggest therefore that it would make sense to have the framework
be there items:
1) Guidelines for document exchange - provides means to capture the who
and the what - MoU / CPA level agreements
- can be both XML layout and / or document template.
2) Formal ability to express scripts that describes the content items,
validations and checks and re-packaging occurring:
- sample for XML scripting to drive PDF receipt processing
- reverse scripting - template for generating document that will
be filled in.
3) Formal set of document handling primatives to work with 2) that can
be implemented for various document formats
- iText library good starting point for creating function set
- function set would be only a subset of these functions - aimed
at exchange use case only
What this does therefore is allow exchanges to occur in a variety of document
formats, both now, and into the future - but provides a common means to
handle these, build them, and fill them in - regard! less of the underlaying
syntax of the documents themselves.
Now of course this is a MUCH bigger elephant! How much work does
the TC want to chew off?
Conversely - you could view it the other way around - the PDF / XML approach
is "low hanging fruit" - the OSS implementation exists with a
large and active community - providing the XML handler there would be quick
- and an implementation to support it simple.
Once that PDF use case is in place - then extend it out to ODF and Word
next....by implementing the iText functional set for those formats too.
This would then enable the third piece of course - transformation
- by proxy! I could open a PDF in iText - call the ODF java functions
to save it to ODF - but then that getting ahead of ourselves....
Thanks, DW
"The way to be is to do" -! Confucius (551-472 B.C.)
-------- Original Message --------
Specifically we want to formalize mechanisms for exchanging content between
organizations or applications that are using different XML document standards
- so not PDF per se, but ODF, DITA, and DocBook, for a start, and hopefully
others as we progress.
---------------------------------------------------------------------
To unsubscribe, e-mail:
docstandards-interop-discuss-unsubscribe@lists.oasis-open.org
For additional commands, e-mail:
docstandards-interop-discuss-help@lists.oasis-open.org
---------------------------------------------------------------------
To unsubscribe, e-mail: docstandards-interop-discuss-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: docstandards-interop-discuss-help@lists.oasis-open.org
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]