docbook message

Subject: [docbook] Re: equations
From: Doug du Boulay <ddb@owari.msl.titech.ac.jp>
To: Norman Walsh <ndw@nwalsh.com>
Date: Sat, 5 Feb 2005 18:27:54 +0900 (JST)
On Thursday 03 February 2005 07:33, Norman Walsh wrote:
> / Doug du Boulay <ddb@owari.msl.titech.ac.jp> was heard to say:
> | Norman Walsh <ndw@nwalsh.com> wrote:
> |> DocBook NG still has both the formal and informal versions.
> |>
> |> The odd man out in all this is equation which, for backwards
> |> compatibility reasons in DocBook *4* still has an *optional* title,
> |> even though there's also an informalequation element. I see two
> |> possible ways forward:
> |>
> |>   1. Keep the formal/informal distinction and make title on equation
> |>      required. (This is what I've actually done in DocBook NG.)
> |>
> |>   2. Drop the distinction, drop 
informal{equation,table,example,figure}
> |>      and make title on all those elements optional.
> |>
> |> Option 1 is probably easier for users and for tools, so I'm inclined
> |> to go that way at the moment. The only advantage to option 2, really,
> |> is that DocBook becomes four elements smaller. But the semantic
> |> disjunction is probably too high a price to pay.
> |
> | For the record, even though many equations do actually have names,
> | in the scientific literature I think you will find exactly zero 
instances
> | of titles on equations (for that matter there would be no TOC equation
> | lists iether). For this reason IMHO option 1 would actually be a
> | backwards step hindering the adoption of DocBook amongst a broader
> | community.
>
> Hmm. So requiring title on equation is likely to be controversial :-)
>
> | Customarily equation blocks fall into two classes, these being
> | numerically labelled and unlabelled equations. The existing equation 
and
> | informalequation elements provide a useful method for distinguishing
> | between those cases and my hope is that they could be retained.
>
> Ah, so you use
>
>   <equation>
>     equation content
>   </equation>
>
> (without a title) for equations that should be labeled with a number
> and
>
>   <informalequation>
>     equation content
>   </informalequation>
>
> for ones that shouldn't be numbered?

yep. In my experiments, I also used the "role" attribute on both of those
as a means of identifying when some sequences of labelled and
unlabelled eautions should maintain some semblance of horizontal
alignment.


> | Its a shame there isn't a third option:
> |
> |     3. The equation element be shifted out from the formal list into
> |         a group of its own, because in reality it has a completely
> |         different usage model.
>
> Voila! A third option :-)

If it really *is* a viable option, IMHO, retaining backward compatability 
would be best.  I have actually encountered some online docbook docs that 
had 
 titles on their equations, presumably written by authors keeping an eye 
to the future where titles were slated to be  compulsory in version 5 in 
"The Definitive Guide".

But scientific documents tend to use mathematics as a tool to demonstrate 
why and wherefore relationships, and in that use extraneous titles 
interrupt the flow of ideas. 

Other styles of documentation might treat equations more like curious 
pictures to look at, in which case titles could be useful but perhaps the 
whole block would be better treated as a <figure> element?
(dunno. no experience)


> | Alternatively, if option 2 was adopted could some other standardised
> | means be established to discriminate between labelled and unlabelled
> | equation blocks?
>
> Yes, let's forget the informal/formal distinction for equations. 
>
> How would you suggest distinguishing between labeled and unlabeled
> equations?


For labelled equations there are a few different approaches I guess. 

(1) The label can be determined automagically from its sequence
within the document/chapter and displayed and crossreferenced 
uniformly in the output rendering via the stylesheet transformations.

(2) an explicitly specified label could be embedded within the actual 
encoded math content and rendered into the mediaobject or mathML but
it would thereby remain inaccessible to the DocBook stylesheets for 
consistent crossreferencing. Insertion of new labelled
equations into the document would likely screw up embedded sequential 
labelling schemes. 

(3) Tentatively, the label from (1) could be automatically inserted into
the actual encoded math content of (2), leaving the presentation
of the label to the math renderer. 

Possibly (3) is best but difficult to implement, (2) has huge scope
for confusion but I guess is what most folk are actually doing? Manually 
relabelling  would be ludicrous for large documents undergoing review. (1) 
is something which seems to work but has some a few drawbacks.


The examples below are encoded using latex math, because its 
compact and easy to edit by hand. IIRC the OpenMath crowd can convert
latex math to MathML through some process, iff it is written in the right 
style, but I haven't investigated that thoroughly.

In my experience, equation labelling on the right hand side seems to 
predominate in hardcopy books, but for online docs that potentially extend 
beyond the browser screen, labels on the left would be easier to identify.
Unfortunately, as in example (iv) and (v), it looks a bit incongruous.



Example(i)
Mostly you just want a nicely centred equation and a label
IDEAL
                         a=b+c                          (1.1)
MARKUP
 <equation label="yes" id="E-1">
     <alt role="tex">\[a=b+c\] </alt>
 </equation>

(hate to think what the equivalent two pages of MathML looks like)
rendered as 
RESULT
(1.1)                   a=b+c

where for html, an <xref linkend="E-1"> results in <a 
href="#E-1">(1.1)</a>




Example (ii)
Often an equation should be annotated with inline text
and inlineequation fragments  
IDEAL
           i.e.    a=b+c     where b and c are unknown.   (1.2)   

(a)
MARKUP
 <equation label="yes">
     <alt role="tex">\[ \text{i.e.}\qquad a=b+c
       \qquad \text{where $b$ and $c$ are unknown.} \] </alt>
 </equation>
this would work.
or (b)
MARKUP
 <equation label="yes">
      i.e.
     <alt role="tex">\[a=b+c\] </alt>
      where <inlineequation><alt role="tex">\(b\)</alt> and 
      <inlineequation><alt role="tex">\(c\)</alt> are unknown.
 </equation>
this wouldn't, but it could ultimately be more consistent.


Example (iii) 
There are some instances  where two , or more?
large equations are placed on one, possibly labelled or
unlabelled line which includes some in between text:

IDEAL
    Variance s^2= Sum (x-<x>)^2/ n   where  <x>=Sum x/ n     (1.3)

(a) MARKUP
 <equation label="yes">
       Variance 
     <alt role="tex">\[s_x^2=\frac{\sum_{i=1}^n(x_i-&lt;x&gt;)^2}{n}\] 
</alt>
        where 
     <alt role="tex">\[&lt;x&gt;=\frac{\sum_{i=1}^n x_i}{n}\]</alt>
        and 
     <inlineequation><alt role="tex">x</alt><inlineequation> 
       is an observation
 </equation>
nope. that doesn't work iether, but for consistency of para text
it would be nicer if it could.
(b) MARKUP
 <equation label="yes">
     <alt role="tex">\[\text{Variance}\qqaud
       s_x^2=\frac{\sum_{i=1}^n(x_i-&lt;x&gt;)^2}{n}
        \qquad \text{where} \qquad 
          &lt;x&gt;=\frac{\sum_{i=1}^n x_i}{n}
         \text {and $x$ is an observation.}
    </alt><inlineequation>        
 </equation>
this would probably work.



Example (iv)
And occasionally you have some small set of related equations and labels
IDEAL
       a=b+c                                              (1.4a)
       b=d+e        for d>200 and e unknown.              (1.4b)

(a) 
MARKUP
 <equation label="yes" id="myEq3">
     <alt role="tex">\begin{align}a &amp;= b+c \tag{a}\\
           b &amp;= d + e \qquad \text{for $d>200$ and $e$ 
unknown.}\tag{b}
          \end{align}
     </alt>
 </equation>
The &amp; marks work as horizontal alignment boundaries and the 
trailing "\\" on a line is an equation newline.
Possibly <xref linkend="myEq3" linkterm="a"> might be renderable as (1.4a)
also the latex \tag{a} might be a bad thing to use in general but perhaps
some better alternative exists ...?
 
RESULT
                 a=b+c                               (a)
(1.4)
                 b=d+e     for d>200 and e unknown.  (b) 

But in this case the \text{} would likely be in a different font to 
surrounding paras, which looks bad.

(b)MARKUP
 <equation label="yes" id="myEq3">
     <alt role="tex">\begin{align}a &amp;= b+c \tag{a}\\
              b &amp;= d + e
     </alt>
      for <inlineequation><alt role="tex">\(d>200\)</alt> and 
      <inlineequation><alt role="tex">\(e\)</alt>unknown.
     <alt role="tex"> \tag{b}  \end{align}  </alt>
 </equation>
This could look nicer, but I can't see how it would work in practice 
though.






Example (v)
Very rarely you might encounter "case" statements 
like this:
IDEAL

                   {    -x^2,       if x<0;                   (1.5a)          
        f(x) =     {    alpha+x,    if 0<=x<=1;               (1.5b)
                   {    x^2,        otherwise.                (1.5c)

MARKUP
 <equation label="yes">
   <alt role="tex">\[
      f(x)=
     \begin{cases}
         -x^{2},   \&amp;\text{if} x &lt; 0; \tag{a}\\
         \alpha +x,  \&amp;\text{if} 0\leq 1; \tag{b}\\
         x^{2},       \&amp;\text{otherwise.} \tag{c}
      \end{cases}
    \]</alt>
 </equation>


RESULT
                        {    -x^2,       if x<0;                   (a)          
(1.5)        f(x) =     {    alpha+x,    if 0<=x<=1;               (b)
                        {    x^2,        otherwise.                (c)

again problems with fonts in the \text{}




Example (vi)
In some instances  all you want is some semblance of order between
the various important and inconsequential steps of a derivation
IDEAL
       a=b+c                                   (1.6)
        =(d+e) +  2 pi sqrt(d)
        = f + Integral(g) dx
        = 42                                   (1.7)

(a)MARKUP, assuming Norm's option (2)
 <equation label="yes" role="cont">
   <alt role="tex">\begin{align}a &amp;=b+c \\</alt>
 </equation>
 <equation             role="cont">    <!-- no labels here! -->
   <alt role="tex">           &amp; = (d+e) +  2 \pi \sqrt(d)\\
                              &amp; = f + \int{g dx} \\
    </alt>
 </equation>
 <equation label="yes" role="terminal">
   <alt role="tex">    &amp;=42
       \end{align}
   </alt>
 </equation>
yep three distinct equation elements there.

(b) MARKUP, assuming option (3) 
 <equation role="cont">
   <alt role="tex">\begin{align}a &amp;=b+c \\</alt>
 </equation>
 <informalequation role="cont">    <!-- no labels here! -->
   <alt role="tex">           &amp; = (d+e) +  2 \pi \sqrt(d)\\
                              &amp; = f + \int{g dx} \\
    </alt>
 </informalequation>
 <equation role="terminal">
   <alt role="tex">    &amp;=42
       \end{align}
   </alt>
 </equation>
(I have a html xsl customization  which seems to work with this
   for html output and dvi2bitmap. It automatically generates the label 
and imagefilename) 


Not much to choose between those last two MARKUP cases really.
Both would work ok. 
For option (2) I am not sure if there is any scope for having an extra 
attribute sublabel="a b c", or somesuch as a future sublabelling 
mechanism for automatic insertion in the latex \tag{} or its equivalent. 

Just to confuse the issue, for latex markup a \label{} is analogous to a 
named unique xml id="" attribute and doesn't appear in the text, whereas a 
\tag{} apparently sets an explicit unique \label{}/id and also replaces 
the numeric label.

Possibly, for completeness, there should be an attribute for 
explicit non-numeric labelling of <equation/> elements also?


And no doubt there are problems with all this for the dblatex pdf/ps 
output folk who might be doing all equation labelling in pure tex within 
the <alt role="tex"/> stuff. Maybe everything just works for them. I don't
know.

How are equations done with FO, purely using mediaobjects?

Does MathML support equation labelling?


better ideas?
help!

best regards
Doug

P.S. sorry about the broken list thread.