office message

Subject: RE: [office] Proposal for lists/numbered paragraphs
From: Philip Boutros <Philip.Boutros@stellent.com>
To: Daniel Vogelheim <Daniel.Vogelheim@sun.com>, office@lists.oasis-open.org
Date: Mon, 17 Mar 2003 11:35:25 -0600
Hi all

I am in the unfortunate position of trying to argue against Daniel's options 1 and 2 and for his option 3. While I agree that options 1 can express the formatting required to display the lists correctly, it fails to correctly capture the real semantic information in the major word processors. In all major word processors lists and numbering are closely associated with paragraph style. Going with option 1 or option 2 removes this association and loses important (from the word processor's point of view) information in the process. Without this association we are guaranteeing that a conversion from Word to OO and back to Word is significantly lossy in this area. Like it or not, numbering is not structure in any real sense in these formats, its just sophisticated style. We would be doing a disservice to the word processing community if we choose to drop this type of numbering in favor of a lossy (from the word processor's point of view) structure based list scheme purely to make XSLT writers lives easier.

Unfortunately an important work commitment has come up and I will miss today's call. I hope someone picks up my banner :-)

-Phil

-----Original Message-----
From: Daniel Vogelheim [mailto:Daniel.Vogelheim@sun.com]
Sent: Thursday, March 13, 2003 2:11 PM
To: office@lists.oasis-open.org
Subject: Re: [office] Proposal for lists/numbered paragraphs


Hello Paul,

Paul Grosso wrote:
 >> [... proposal for lists + number element ...]

> Coming from a structured markup world and not a WP one, I don't really
> understand this stuff.  I can't really imagine what the use case is (I'm
> sure there are many in WP land, I just have a hard time imagining most
> things in WP land).
> 
> Is the number that is put in front of a numbered paragraph given as part
> of the style, or is it supposed to be computed automatically by counting
> something?  And if the latter, what is counted?

In internal word-processor implementations (apparently, OOo Writer, 
WordPerfect, KOffice, Word share this trait), the style assigns the 
paragraph into a certain numbering domain. So all paragraphs in the same 
numbering domain receive a hierarchical numbering based on their 
numbering level.

The user interfaces often try to make this look more list-like, so one 
could argue that this is somewhat of a legacy. I'm tempted to take this 
position for Writer, but I'd probably get in trouble with my 
collegues... ;-)


> Since I don't really have my head wrapped around this stuff, I can't really
> tell how this might work in an XSL world, but I'm nervous that this could
> cause problems for XSL.

I see your point. The problem, as I see it, is that the word-processing 
world has one kind of concept, and HTML, XSL, and basically any other 
XML format, have a different kind of concept. We want to do good XML, we 
need to represent word-processing documents, and hence we have a 
problem: How do we join these two? Both issues are listed in the charter 
as requirements, the former as  4) and 6), the latter as 1).

The OOo team has given an answer, one which has been accepted as the 
base specification: Use HTML-style lists, plus an extra 
'continue-numbering' attribute. For all I can tell, this covers both 
requirements (being reasonable XML-wise, and being able to represent all 
documents), and hence it is a good answer. However, nothing is perfect, 
and this does place a burden on implementers for file format converters.

Instead, one could choose a representation that is close(r) to the 
word-processing side of things. This would make it easier for those 
people, but would make it harder for the structured markup people. And 
this is, in my view, the core of the discussion: There is inherent 
complexity in trying to bridge two worlds, and the committee gets to 
decide where to push that complexity, and how to slice it up.

The one suggestion is to do structured lists + continue-numbering.
+ structure where there is structure
- pseudo-structure in some other places where list numbering is used
   without proper list.
+ easy for XML conversion
- the structure may be hard to generate during conversions, and the
   pseudo-structure is ugly
Essentially, the conversions is where we push the complexity to.

The next suggestion (Michael) uses that as base, but adds an 'escape' 
[i.e. declaring an individual paragraph to be listed at some level] to 
take away some of the burden from the filter people:
+ (optional) structure where there is structure
- easy generation using the escape
- redundant representation: there may be badly-behaved documents that
   don't generate any lists even where suitable
- corollary from above: For complete processing one really needs to
   fully support both.
Here, we distribute the complexity more equally, and various parties get 
to choose how much effort they want to put into it. Problem may be that 
the process may just not work very well if everyone chooses the easy way 
out.

The third suggestion would be to do things word-processor style 
completely, i.e. by having lists as a side-effect of formatting.
+ easy to generate from existing wp formats
- structure may be hard to generate during XML processing
- one may need alternative list structures for other parts of the
   format, i.e. for presentation modules.
Here, we push the complexity over to the XML/structured markup side. You 
want lists? You make them!
To some degree, this solution would fail the previously mentioned 
requirements 4)+6), while doing quite well on the 1) side. I'm not sure 
how good of a solution this is, given that the previous ones seem to 
cover all bases properly.

There has been an argument that goind the first route may be trying to 
be overly clean on this fairly arbitrary issue, while taking the easier 
way with other items (e.g. headers vs DocBook style sections). I'm not 
sure if I can follow this, but it was one of the given arguments and 
thus should be mentioned.


Paul, you said you sense trouble for XSL. Well, I think the necessary 
conversions are expressable as XSLT, if ugly. Given my taxonomy above, 
the third (and possibly second) solution places the complexity squarly 
in your lap, so you're quite right about sensing trouble. As I said 
before, I think it's doable, but I also think the other solutions are a 
better way to go.


As a general guideline, I would say that 'clean' XML solutions are to be 
preferred _if_ they can represent existing documents. This seems to be 
the case with solutions 1 (and 2) above.
If we _cannot_ find a 'clean' XML way to represent existing documents, 
we have a problem and will need to do funny stuff. Example: tabs. I 
don't think lists qualify here, because a workable, XML-like solution 
exists.



Hope this helps!

Daniel



Received: (qmail 8091 invoked by uid 60881); 20 Mar 2003 01:56:29 -0000
Received: from Paul.Langille@corel.com by hermes by uid 0 with qmail-scanner-1.15
 (spamassassin: 2.43.  Clear:SA:0(0.5/8.0):.
 Processed in 0.198471 secs); 20 Mar 2003 01:56:29 -0000
X-Spam-Status: No, hits5 requiredŽ0
Received: from unknown (HELO OTT-VSVR1.corelcorp.corel.ics) (206.47.20.33)
  by mail.oasis-open.org with SMTP; 20 Mar 2003 01:56:29 -0000
X-MimeOLE: Produced By Microsoft Exchange V6.0.4712.0
content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain;
	charset.s-ascii"
Content-Transfer-Encoding: quoted-printable
Subject: Back-Tab in WordPerfect and Unicode specifications
Date: Wed, 19 Mar 2003 21:04:19 -0500
Message-ID: <AB04EA255B214B448BE25DA1413B436126F4D2@OTT-VSVR1.corelcorp.corel.ics>
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Thread-Topic: Back-Tab in WordPerfect and Unicode specifications
Thread-Index: AcLuhPoRik3spzkYTiukyOZ9AB5O0Aûom: "Paul Langille" <Paul.Langille@corel.com>
To: <office@lists.oasis-open.org>

As of Thursday, I have not found ANY reference to any back-tab
representation in the Unicode 3.2 specification I have read so far. I do
not expect to see any such code, as the main three sections that deal
with 'formatting' characters do not seem to have any such representation
in any context outside of vertical and horizontal tab. The WordPerfect
file-format specification has a large number of codes that will present
any number of difficulties in representation in the Open-Office XML file
format. 

WordPerfect File Format Documentation has this code listed as
variable-function code 224 (0xE0) (documentation which formally was
available on the Corel website). The 'File Format' documentation was
available as part of the help.

My only distributable source is 'offsite'. A link to non-Corel provided
information is at:

http://jdan.com/perfectscript/macros/ch12_b1.htm#_VPID_12_292

I am currently trying to find out what happened to the SDK, but I cannot
guarantee about getting it back for general use and download.

( Note: Unfortunately, I will not be attending the TCON on Monday due to
prior commitments. )


Paul Langille
Software Developer
Corel Corporation
Follow-Ups:
- Re: [office] Proposal for lists/numbered paragraphs
  - From: David Faure <faure@kde.org>