OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

docbook-apps message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [docbook-apps] Rendering MathML in Docbook


Was my previous message readable to folks, or did it appear empty
(except for signature and such)? I think I've run into a bug in Gnus
(first MIME part not marked as such under certain circumstances)...

Here's the content again:


For TeX->MathML conversion I use TtM
(http://hutchinson.belmont.ma.us/tth/mml/), which works quite well and
is well maintained (I've had bugs fixed only days after I pointed them
out).

My automated setup works as follows:

- I include math as follows in the XML source:

      <informalequation>
	<mediaobject>
	  <textobject role="tex">
	    <phrase>&texmath;<![CDATA[
	      P(X \mid Y_1, \ldots, Y_m)
	      ]]></phrase>
	  </textobject>
	  <textobject role="html">
	    <xi:include xmlns:xi="http://www.w3.org/2001/XInclude";
	      href="texmath-simple-model.xml"/>
	  </textobject>
	</mediaobject>
      </informalequation>

  This works also with <inlineequation>, and could probably quite
  easily be made work with <equation>.

  The &texmath; entity is optional and points to a file called
  texmath.tex containing declarations that the LaTeX math relies on,
  e.g.:

      \newcommand{\emphasis}[1]{{\color{blue}#1}}
      \newcommand{\key}[1]{{\color{red}#1}}

- The makefile calls a perl script (appended below) that parses the
  DocBook XML file (brute force, no true XML parsing), extracts
  content from between the lines containing the <phrase>s in the
  proper places, and does the following for each such block of content:

  - embed it into a LaTeX stub inside a displaymath environment

  - pipe the result through TtM

  - extract the mathml body from TtM's HTML output

  - post-processes the MathML code to work around bugs and limitations
    of TtM and Mozilla

  - add a mml: namespace prefix

  - obtain the output file name from the corresponding XInclude
    statement, and write the processed MathML code into it, using a
    proper DOCTYPE declaration.

This results in valid MathML files, and a valid overall DocBook XML
source (with MathML extensions and --postvalid for xmllint).

My perl script (with TtM) does all I have needed so far, which covers
quite a large subset of LaTeX math and relatively complex formulas
(browse around
http://www.montefiore.ulg.ac.be/~piater/Cours/INFO0013/plan.php for
examples), but it is still limited. The DocBook parsing would be more
elegant and less error-prone if a real XML parser were used; this
should probably be quite easy to code using existing Perl XML modules.

The postprocessing cannot be done entirely by an XML parser because
some valid LaTeX input results in not-well-formed XML output. This
concerns primarily nontrivial \mbox{} hackery. (This is based on the
now-outdated TtM 3.49.)


If anyone out there goes down the same route, I would like to hear
from you, and especially so if you use my perl script.

Enjoy,
Justus


Jirka Kosek <jirka@kosek.cz> wrote on Wed, 18 Feb 2004 09:56:54 +0100:

> Justus Piater wrote:
>
>> Note though that PassiveTeX's MathML capabilities are limited. I
>> code all non-trivial math in TeX, which is automatically converted
>> to MathML for the HTML backend.
>
> Could you tell us what software you are using for TeX -> MathML
> conversion?


#!/usr/bin/perl

$debug = 0;
$raw = 0;

use IPC::Open2;

my $texdecls = '';
my $texdeclsfile = 'texmath.tex';
if (open(TEXDECLS, $texdeclsfile)) {
    while (<TEXDECLS>) {
	$texdecls .= $_;
    }
    close(TEXDECLS);
    print "texmath2xml: using tex decls file $texdeclsfile\n";
}
else {
    print "texmath2xml: no tex decls file $texdeclsfile found\n";
}
    
while (<>) {
    $debug && printf "Base: $_";

    if (/\<!ENTITY texmath/o) {
	print "ENTITY texmath found\n";
	while (($_ = <>) && !/\]\]\>/o) {
	    $texdecls .= $_;
	}
    }
    
    if (/\<((informal|inline)?)equation/o) {
	my $eqntype = $1."equation";
	$debug && printf "found $eqntype\n";
	while (<>) {
	    /\<\/((informal|inline)?)equation/o && last;
	    if (/\<((inline)?)mediaobject/o) {
		$debug && printf "found mediaobject\n";
		while (<>) {
		    /\<\/((inline)?)mediaobject/o && last;
		    if (/\<textobject([^>]*)/o &&
			$1 =~ /role=\"tex\"/) {
			$debug && printf "found textobject role=\"tex\"\n";
			# slurp in tex code:
			my $tex;
			while (<>) {
			    /\<\/textobject/o && last;
			    if (/\<para|phrase/o) {
				while (($_ = <>) && !/\<\/para|phrase/o) {
				    $tex .= $_;
				}
			    }
			}
			my $mathmlfilename = &getXIncludeFileName();
			(length($mathmlfilename) > 0)
			    && &genMathML($tex, $mathmlfilename, $texdecls);
		    }
		}
	    }
	}
    }
}


sub genMathML() {
    my ($tex, $mathmlfilename, $texdecls) = @_;
    my $ttmcmd = 'ttm -r';
    my $texpreheader = "\\documentclass{article}";
    my $texpostheader =	"\\begin{document}\\begin{displaymath}\n";
    my $texfooter = "\\end{displaymath}\\end{document}\n";
    my $mathmlheader = "<?xml version='1.0'?> <!-- -*- coding: utf-8; -*- -->
<!-- created automatically by texmath2html/ttm -->
<!DOCTYPE mml:math PUBLIC \"-//W3C//DTD MathML 2.0//EN\"
 \"http://www.w3.org/TR/MathML2/dtd/mathml2.dtd\"; [
 <!ENTITY % MATHML.prefixed \"INCLUDE\">
 <!ENTITY % MATHML.prefix \"mml\">
]>

<mml:math>
";

    # Work around a nonimplemented LaTeX construct:
    $tex =~ s/{subarray}/{array}/go;

    open2(FROMTTM, TOTTM, $ttmcmd) || die "cannot call ttm: $!";
    print TOTTM $texpreheader.$texdecls.$texpostheader.$tex.$texfooter;
    close(TOTTM);
    my $mathml;
    while (<FROMTTM>) {
	$mathml .= $_;
    }
    close(FROMTTM);

    if ($raw) {
	print "==== Begin TTM output for $mathmlfilename ====\n";
	print $mathml;
	print "==== End TTM output for $mathmlfilename ====\n";
	return;
    }	

    # chop off extraneous leading and trailing lines:
    #  8 for ttmL 3.38
    # 11 for ttmL 3.45
    # 11 for ttmL 3.49
    for ($i = 0; $i < 11; $i++) {
	$mathml = substr($mathml, index($mathml, "\n") + 1);
    }
    # 3 for ttmL 3.38
    # 3 for ttmL 3.45
    # 3 for ttmL 3.49
    for ($i = 0; $i < 3; $i++) {
	$mathml = substr($mathml, 0, rindex($mathml, "\n"));
    }

    # work around strange "features" and bugs of ttm:
    $mathml =~ s/\&emsp;/\&thinsp;/go; # still exists in 3.49
    $mathml =~ s/\'/\&prime;/go;       # still exists in 3.49
    $mathml =~ s/\<font color=/\<mstyle mathcolor=/go;
    $mathml =~ s/\<\/font\>/\<\/mstyle\>/go;
    #$mathml =~ s/\<mi\>\&\<\/mi\>/\<mo\>&amp;\<\/mo\>/go;  # correct in 3.45?
    $mathml =~ s/\>:(\<\/m[^o])/\>\<mo\>:\<\/mo\>\1/go;	# persists in 3.49
    # ~/Cours/2003-04/INFO0055/Notes/10/texmath-fct-verification.xml
    
    # no &nbsp; allowed outside of markup; no mspace allowed inside mtext:
    # Caveat: malignmark and mglyph are allowed inside mtext
    # Caveat: does not work for \hspaces inside \mbox
    #$mathml =~ s/\&nbsp;/\<mspace width="0.2em"\/\>/go;
    $mathml =~			# still exists in 3.49
	s/(\>[^\<\&]*)((\&nbsp;)+)([^\<]*\<)/\1\<mtext\>\2\<\/mtext\>\4/smgo;
    # no mrow inside mtext allowed: (persists in 3.49)
    $mathml =~ s/\<mtext\>[^\<]*\<mrow\>/\<mtext\>/smgo;
    $mathml =~ s/\<\/mrow\>[^\<]*\<\/mtext\>/\<\/mtext\>/smgo;
    # no mstyle in mtext allowed: (persists in 3.49)
    $mathml =~ s/\<mtext\>[^\<]*\<mstyle/\<mtext/smgo;
    $mathml =~ s/\<\/mstyle\>[^\<]*\<\/mtext\>/\<\/mtext\>/smgo;
    # need to enclose () in mrow to get stretching right: (3.49)
    $mathml =~ s/(\<mo[^>]*\>\(\<)/\<mrow\>$1/smgo;
    $mathml =~ s/(\>\)\<\/mo\>)/$1\<\/mrow\>/smgo;

    # work around apparent bug in Mozilla:
    # wrap munder/over inside extra mrow
    $mathml =~ s/\<munderover\>/\<mrow\>\<munderover\>/go;
    $mathml =~ s/\<\/munderover\>/\<\/munderover\>\<\/mrow\>/go;
    $mathml =~ s/\<munder\>/\<mrow\>\<munder\>/go;
    $mathml =~ s/\<\/munder\>/\<\/munder\>\<\/mrow\>/go;
    $mathml =~ s/\<mover\>/\<mrow\>\<mover\>/go;
    $mathml =~ s/\<\/mover\>/\<\/mover\>\<\/mrow\>/go;

    $mathml =~ s/(\<\/?)/\1mml:/go;

    print "creating $mathmlfilename\n";
    open(MATHML, ">$mathmlfilename")
	|| (print "cannot write to $mathmlfilename", return);
    print MATHML $mathmlheader.$mathml."\n";
    close(MATHML);
}


sub getXIncludeFileName {
    while (<>) {
	$debug && printf "getX: $_";
	# look for textobject with role="html" inside mediaobject:
	/\<\/((inline)?)mediaobject/o && last;
	if (/\<textobject([^>]*)/o &&
	    $1 =~ /role=\"html\"/) {
	    $debug && printf "found textobject role=\"html\"\n";
	    while (<>) {
		# look for xi:include:
		/\<\/textobject/ && last;
		if (/\<xi:include/) {
		    $debug && printf "found xi:include\n";
		    if (/\<xi:include([^>]*)/o &&
			# filename on same line
			$1 =~ /href=\"([^\"]+)\"/) {
			return $1;
		    }
		    else {
			# look for filename in following lines
			while (<>) {
			    if (/([^>]*)/o &&
				$1 =~ /href=\"([^\"]+)\"/) {
				return $1;
			    }
			    /\>/ && last; # found end of xi:include
			}
		    }
		}
	    }
	}
    }
    return '';
}


-- 
Justus H. Piater, Ph.D.         http://www.montefiore.ulg.ac.be/~piater/
Institut Montefiore, B28        Phone: +32-4-366-2279
Université de Liège, Belgium    Fax:   +32-4-366-2620



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]