OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

office message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Unicode relative spaces -- Was, proposal for new position and space attributes for the list level

On 2/19/07, Oliver-Rainer Wittmann - Software Engineer - Sun
Microsystems <Oliver-Rainer.Wittmann@sun.com> wrote:
> Ok. Thus, we arrive at the conclusion that we have more than one
> view/opinion on my proposal - that's somehow natural. But, it doesn't
> seem that anything is unclear in my proposal.
> Let's vote on it.

Might we delay a bit on voting? This also has ramifications for ODF
1.1 section 4.1.1 (Headings).
<http://develop.opendocumentfellowship.org/spec/?page=4#4.1>. Also,
I've been working on a proposal that might impact how horizontal
spaces are handled in lists. I had hoped to put in another day on it,
but y'all have forced my hand. :-)

The ODF specification currently lacks support for the Unicode em-based
relative typographic spaces. Adding such support to the specification
is one of my goals. At the very least, I hope we might avoid defining
lists in a way that interferes with later adoption of such support.

ODF 1.1 section 16.1 currently provides in relevant part:



A (positive or negative) physical length, consisting of magnitude and
unit, in  conformance with §5.9.11 of [XSL:FO]. Supported units are
„cm", „mm", „in", „pt" and „pc". Applications *shall* support all
these units. Applications *may* also support "px" (pixel). Where the
description of an attribute explicitly states that pixel lengths are
supported, applications  should support them.

Examples for valid lengths are "2.54cm" and "1in".


So we apparently have only absolute and margin percentage positions
for horizontal measurements currently supported in the spec. That
seems to me as though it could lead to issues in transformations.
XSL:FO section calls for use of the em-based system as the
unit of measurement for relative units, but despite the citation to
that section in the ODF spec, ODF does not reflect XSL:FO in that
regard. See <http://www.w3.org/TR/2006/PR-xsl11-20061006/#d0e5490>.

That is a major wart in ODF from my perspective as a former
typographer before a mid-life career change. The em-based scalable
typographical widths are more than 500 years old and are incorporated
in virtually all digital type faces in any human language.

The em-based relative units of measurement are defined in the Unicode
standard, <http://www.unicode.org/charts/PDF/U2000.pdf>, pg. 167. (But
ignore the em and en quads at least for now; they are, I believe,
obsolete.) There is a better compilation of them that adds some listed
in other places here,
<http://www.cs.tut.fi/~jkorpela/chars/spaces.html>. But as I
understand it, the em, en, and the various thin spaces, although
identified as Unicode characters, have to be implemented at the
application level; they are not included as characters in digital
typefaces in general usage since they are non-visible.

Adding tags for the typographic spaces to the ODF and ODF applications
repertoire should enhance ODF's compatibility not only with XSL:FO,
but also with CSS. Use of the typographic spaces is recommended by the
W3C Web Content Accessibility Guidelines Working Group in their best
practices guide for CSS stylesheets.
<http://www.w3.org/WAI/GL/css2em.htm>. That page recommends that type
sizes and horizontal spaces be expressed in CSS stylesheets in ems
(scalable) rather than using percentages or absolute measurements
unless there is a specific requirement for non-scalable widths.
Microsoft's take on implementing these spaces is here.
The Unicode em, en, and thin space are also supported in HTML.

<!ENTITY ensp    CDATA "&#8194;" -- en space, U+2002 ISOpub -->
<!ENTITY emsp    CDATA "&#8195;" -- em space, U+2003 ISOpub -->
<!ENTITY thinsp  CDATA "&#8201;" -- thin space, U+2009 ISOpub -->


Here is a bit of explanation about the em-based typographic spaces in
case you are not familiar with them. They are basically tools for
horizontal alignment of visible characters. I was a typographer for
some 20 years before a career change to the law so this is second
nature to me. But the em system is simple to learn and work with.

The em space is the square of the typeface's point size, exclusive of
any variation in vertical leading or linespacing.  So the em in eight
point type is eight points wide and the em in 24 point type is 24
points wide. See XSL:FO: section 5.9.72,

The en space is half the width of the em, so in eight point type would
be 4 points wide and in 24 point type would be 12 points wide. The
various thin space widths are calculated as fractions of the em,
generally corresponding to the width of various punctuation marks. The
Unicode spaces also include the "punctuation space," which corresponds
to the width of the more common punctuation marks.

The typographic spaces scale when type sizes are changed, for example,
when a person with visual disabilities uses an enlarged typeface for
viewing an on line web page. The em system should fit well with a rich
file format designed with transformations of text to other formats in
mind, since the spaces scale automatically if there is a change in the
type's point size; i.e., there is no need to revisit horizontal white
space specified in ems during transformations, unlike white space
identified by fixed measurements.

The first big trick to understanding the em system of measurements is
to keep in mind that in type faces with variable widths, not all
characters are variable in width. Many characters' widths are tied in
a standardized way to the widths of the em, the en, and the various
thin spaces. For example, in Latin 1 typefaces, the em dash and em
leader are both one em wide. Numbers, currency symbols, the slashes,
the plus and minus signs, the en dash, the left and right double
quotes are all one en wide. The Unicode punctuation space character
corresponds to the width of the typeface's period, comma, semicolon,
exclamation point, and hyphen. The width of the em is unaffected by
variations in the type face, e.g., the em stays 8 points wide in both
8 point Stymie light and 8 point Stymie extra bold.

(There at least used to be some variability in typefaces on the width
of the hyphen, with some typefaces making them 1/5 em wide and some
1/6 em wide. I haven't looked yet to see if that has been standardized
more recently although I suspect they have; witness HTML's support for
only one size of thin space. The equivalencies I list above are
examples only; the list is not comprehensive.)

The second big trick to understanding the system is to recognize that
the typographical spaces were expressly designed as a method of
horizontally aligning text. Before the advent of digital tabs, tables,
and the like, they were the standard method of aligning columnar
matter within consecutive lines of justified type, except in the
typewriter world where tab stops were used.

So, for example if you are setting 8 point type using a en-width
bullet padded on the right with an en space, the indent for subsequent
lines in the paragraph would be one em. (I'm ignoring the left
indentation applied to all list items.) If the number were followed by
a period, the subsequent lines would be one em space plus a
punctuation space. If the type is reformatted as 12 point, the
relative measurements still apply without resetting
<text:space-before> or <text:space-after>.

A more complex horizontal alignment issue is presented by multiple
columnar matter. E.g.,

[TextA]  $ 1,037       E    824
[TextB]       17             15
[TextC]       --             --
[TextD]   18,046         14,074

(The "E" is my substitute for the Euro currency symbol and I've made
no attempt to provide actual currency conversion values.) Assuming the
following tags:

<justify-to-fill> = point where variable space is inserted to fill the line
<emsp> = em space width
<ensp>> = en space width
<punctsp> thin space corrresponding to width of a period (and comma) [1]
<emdash> a dash character one em wide.

The following example is how alignment of columnar matter had to be
done in the hot type days of typography. Working with the above
example of columnar matter, marking up to left align the left column
and right align the center and right columns (and ignoring paragraph
ending tags) we would have:


So with that loosening up of the mind muscles, :-) we can now turn to
the problem of the numbered paragraph in ODF. For indented paragraphs
the following marked up text would produce uniformly indented right
aligned paragraph numbers and a left aligned first text character in
each paragraph whether text is set to left justified or full


<emsp><emsp>1.<emsp>Fourscore and seven years ago, our forefathers
brought forth on this continent a new nation, conceived in liberty,
and dedicated to the proposition that all ideals tend to degenerate in


<emsp><ensp>>15.<emsp>Now is the time for all good men to come to the
aid of the party of their choice.


<emsp>142.<emsp>Ninety-nine bottles of beer on the wall, ninety-nine
bottles of beer. Take one down and pass it around, ninety-eight
bottles of beer on the wall.


And that relative spacing would survive cross-application
transformation to other type sizes or faces.

Now consider what happens if tab stops are set by em-width positions
rather than by absolute measurements or margin percentages.  Suddenly,
you have lists whose indentations for subsequent lines survive
transformation to other type sizes as well, without an algorithm for
tweaking the tab settings.

Beyond list issues, support for the Unicode typographic spaces in ODF
applications would bring ODF into conformance with the major
publishing stylesheets that require their use, e.g., in paragraph
indentations, and separation of the em dash from surrounding text by
hair thin spaces.

It would also allow applications to offer more flexibility to users.
For example, an "insert leaders to fill line" feature. Currently in
most (all?) word processors, there is no way to create lines with dot
leaders aligned and separated by a user-specified interval other than
by setting aligning tab stops. E.g., if a user wanted to to create
fully justified lines such as the following (but with the dot leaders,
currency symbol, and the right column properly aligned horizontally):

Bolts .  .  .  .  .  .  .  .  . $     1.09
Bolt cutters   .  .  .  .  .  .      15.99
Surgical bolt removers  .  .  .  10,999.99

Where it gets nasty in current word processors with no work-around is
in automatically generated page indexes. Users are simply offered no
option to specify leaders separated by uniform spaces with the dot
leaders aligned. Instead, they get only an ugly, solid mass of dot
leaders that visually overpower text, unseparated by spaces. E.g.,

I.   Summary .............................................................    1
II.  Ecma 376 Is Less Than Completely Open ...............................   17
     A. About those 'compatible' but unspecified binary formats ..........   19
     B. Miscellaneous Ecma 376 warts ..................................... 5082
III. Poppa Sang Bass; Momma Played Fiddle ................................ 6040

The difference is visually profound. What **can not** currently be
done automatically is what is generally recognized as good
typographical layout. What **can** currently be done automatically
fails the final exam in Typographic Layout 101.

I can not point to an application developer who currently would
support the Unicode relative typographic spaces. But I hope that we
might lay the groundwork for the future by adding tags for those
spaces. Or we might at least try to avoid specification of lists that
would interfere with such tags' later adoption. I'll leave it to those
with greater understanding of the lists issues to determine whether
either of the suggested approaches would create such a barrier.

Also, I'll point out that implementing them in applications with users
able to insert the various relative spaces manually could move word
processors quite a bit closer to desktop publishing solutions'
capabilities. And to me, it makes more sense to specify horizontal
widths in relative units rather than absolute measurements.


More resources on the Unicode em-based relative spaces:


Unicode typographical spaces discussed:
<http://www.unicode.org/versions/Unicode4.0.0/ch06.pdf>, pp. 154-155.



[1] In my opinion, the ODF specification also needs a
<text:space-to-fill> tag to indicate the insertion point for space
needed to justify a line left and right. Where two or more tags occur
per line, the implementing application should divide the space to fill
equally in the specified position. For example, take the common book
header line that includes a page number, the chapter title, and the
book title:


The One That Got Away          Trolling for Tags             17


The markup would be:


The One That Got Away<text:space-to-fill>Trolling for Tags<text:space-to-fill>17


WordPerfect has supported an equivalent tag at least since WP v. 5.1.
I think the kind of contortions it presently takes for users to create
an equivalent effect using ODF are well captured by the KWord
documentation, section 11, which itemizes 35 steps for users to set up
such a header for alternating pages with page numbers that remain on
the outside margin. An Insert > Space to Fill menu option that inserts
the proposed <text:space-to-fill> tag would dramatically simplify the
process for users.

Best regards,


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]