OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

docbook-apps message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Re: [docbook-apps] tagged and accessible PDF document with DocBook

Hi Holger,
No, you didn't miss anything. The DocBook XSL stylesheets currently do not provide support for tagged PDFs.

That said, my short investigation shows that implementing such support is nontrivial. Keep in mind that the DocBook stylesheets don't actually create a PDF. The stylesheets generate a FO version of the document, and then an XSL-FO processor converts that to a PDF. So DocBook XSL has to generate additional markup in the FO output that an XSL-FO processor can convert to PDF tags.

It looks like each of the XSL-FO processors commonly used with DocBook (FOP, XEP, and Antenna House) have different extensions for implementing the FO needed to generate the PDF accessibility tags. For example:

FOP has fox:alt-text
XEP has rx:pdf-structure-tag
AH expects axf:pdftag

This situation is similar to when PDF bookmarks were first implemented. Each XSL-FO processor had their own extensions to implement that feature, and DocBook XSL had to support all three extensions. When XSL 1.1 standardized the markup for bookmarks, then all the XSL-FO processors eventually implemented that standard and so did DocBook XSL.

For accessibility, XSL 1.1 says suggests outputting a role attribute with this content:

"To aid alternate renderers, the <string> value should be the qualified name (QName [XML Names] or [XML Names 1.1]) of the element from which this formatting object is constructed. If a QName does not provide sufficient context, the <uri-specification> can be used to identify an RDF resource that describes the role in more detail. This RDF resource may be embedded in the result tree and referenced with a relative URI or fragment identifier, or the RDF resource may be external to the result tree. This specification does not define any standard QName or RDF vocabularies; these are frequently application area dependent.
Other groups, for example the Dublin Core, have defined such vocabularies."

If we used "name of the element from which this formatting object is constructed", that would be DocBook element names, which would not be recognized by any of the XSL-FO processors. Providing an RDF description of the mapping of such element names would also not be recognized by the XSL-FO processors, as far as I can tell.

You suggested that FOP expects HTML element names in the role attribute, but I wonder if that is the case with the other XSL-FO processors?

I would be interested in adding PDF tagging to DocBook XSL. It would help if there were a clear spec for how to do so. If I have to figure it out for each of three XSL-FO processors, that's going to take some time.

Bob Stayton
Sagehill Enterprises

On 4/3/2017 6:25 AM, Holger Bast wrote:
Dear all,
I'm trying to generate "tagged" and "accessible" PDF documents, via docbook5 -> xsfl:fo -> pdf (with Apache FOP and docbook-xsl v1.79.1). I tried both FOP-parameters (accessibility/PDF-UA) and received a tagged PDF file. But I found out that there is no structural information inside the pdf; 'everything' is tagged as p(aragraph). The xsl:fo also lacks this kind of information. The Apache FOP Accessibility help recommends using the role attribute for tagging information inside the document:

<fo:block role="H1" font-weight="bold">I. A Level 1 Heading</fo:block>

Did I miss something (parameter, wrong stylesheet)?
Has anyone already generated accessible PDF documents based on DocBook?

Thanks, Holger

To unsubscribe, e-mail: docbook-apps-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: docbook-apps-help@lists.oasis-open.org

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]