OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

docbook-apps message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Re: [docbook-apps] tagged and accessible PDF document with DocBook

On 24/04/2017 17:24, Holger Bast wrote:
I started on writing a specification document that maps the DocBook
elements to the necessary PDF structure elements. Well, I started
with FOP (based of PDF v1.5) as basis, but I think that the other
processors act the same way. It would be great if someone familiar
with XEP, Antenna, DocMill, etc. could have a look at the document
and give some feedback. The document can be found over here:

1. In Section, Strong vs. weak block-level structures in PDF
files, you say 'DocBook already provides a very strong structure' [1],
but then your examples use 'H1' to 'H5' PDF tags.  The definition of
'Strongly structured' that you copied from the PDF 1.5 reference says to
use 'H' in strongly structured PDF files.  Section 7.4.4, Unnumbered
headings', of ISO 14289 includes both "'H' ... should be used in
strongly structured documents" and "Documents that are strongly
structured may use numbered headings.", so ISO 14289 would also rather
that you use 'H' in strongly structured PDF.

2. It really shouldn't be necessary to specify that 'fo:static-content'
is tagged as 'Artifact'. It should just happen, as specified in Section
7.8, Page headers and footers, of ISO 14289.

3. Similarly, it shouldn't be necessary to supply PDF tags for
'fo:list-block', 'fo:list-item', 'fo:list-item-label', and
'fo:list-item-body': the XSL-FO Formatter should be providing the right
tags for those FOs. AH Formatter will do it, and it seems from the code
in Section 3.1.2, Automatic tagging by Apache FOP, that FOP will do it.

4. AFAICT, PDF tags are case-sensitive, so you probably should use the
specified forms in your examples, e.g., 'Document' instead of 'DOCUMENT'.

5. Within your 'TOCI' for a ToC entry, you should use 'Lbl' for the
entry's title, 'NonStruct' for the leader, and 'Reference' for the page
number citation.

6. My understanding of the 'NonStruct' tag changes on alternate days,
but you might be able to use it on some of the 'fo:block' (that you
can't magic away with your post-processing) to indicate that the
'fo:block' has 'no inherent structural significance'.

7. Putting PDF tag names in @role won't do anything in AH Formatter. If
you need to override the default PDF tag [2] for a particular FO, you
should use @axf:pdftag [3].


Tony Graham.
Senior Architect
XML Division
Antenna House, Inc.
Skerries, Ireland

[1] Though true, I think that strongly/weakly is a binary distinction,
so I don't think you can have a 'very strongly' or a 'mildly strongly'
[2] https://www.antennahouse.com/product/ahf64/ahf-pdf.html#taggedpdf
[3] https://www.antennahouse.com/product/ahf64/ahf-ext.html#axf.pdftag

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]