office-collab message

Subject: Re: [office-collab] Position XML

From: robert_weir@us.ibm.com
To: office-collab@lists.oasis-open.org
Date: Thu, 24 Apr 2014 10:11:33 -0400

<office-collab@lists.oasis-open.org> wrote on 04/24/2014 09:32:59 AM: > From: Svante Schubert <svante.schubert@gmail.com>
> To: office-collab@lists.oasis-open.org
> Date: 04/24/2014 09:28 AM
> Subject: Re: [office-collab] Position XML
> Sent by: <office-collab@lists.oasis-open.org>
> > Hi Peter, > > Am 24.04.2014 14:30, schrieb Peter Rakyta:
> Dear TC members! > > Yesterday on the call we discussed the representation of position > attribute, including tables. > Now I just want to make sure I understand correctly. > > 1) Lets say we have an empty table named Table1 placed at the 3rd > paragraph of a document.
> Allow me to be picky: the table is the third component not the third > paragraph, as the table is being inserted on root level (e.g. beyond > office:text).
> We add to the cell A1 string "Hello world" > Now we will have undo operation (regarding the inserted string):
> <del e=”/3/1/1/1/12” s=”/3/1/1/1/1” type="text" />
> Just another minor correction: We are counting the characters being > deleted, therefore it is from 1 to 11 as there are 11 characters in > the example: <text:p>Hello world<text:p> > Or to count more in comparison: > 123456789AB > Hello world >

I hope in the actual specification text we can be precise about character counting. As we all know, with XML we're dealing with lexical strings, which might include character entities, as well as parsed XML where there is Unicode characters, but even then there are different conventions of dealing with composition sequences, etc. We probably want to cite a specific Unicode normalization form to do the counting on:

http://www.w3.org/TR/2005/WD-charmod-norm-20051027/

It looks like "Form C" is what the W3C is recommending for processing, but I am not certain:

http://www.unicode.org/reports/tr15/tr15-25.html#Specification

Note: This came up in the OpenFormula discussions, since we have spreadsheet functions that deal with extracting substrings at given offsets. In that case, implementations diverged enough that we were only able to mark some functions as "normalization-sensitive", a form of implementation-dependent behavior. I really hope that with CT, since we're starting fresh, we can specify exactly what normalization form to use.
Regards,

-Rob

> 2) Now lets say we have placed another empty table Table2 into the empty cell > A2 of Table1. Then we add a string "Hello world" to the cell A1 of Table2 > This will result in undo operation (again regarding the inserted string):
> <del e=”/3/2/1/1/1/1/1/12” s=”/3/2/1/1/1/1/1/1” type="text" />
> > Exactly, as the inner table (table 2) goes directly into the cell, > while the text is still embedded within a paragraph.
> And so on, if we have another recursively inserted tables. > Iam correct?
> Yes from my opinion. Are there others from the implementors? > > Best regards, > Svante > > PS: Nicely done with the colors..
> Best regards, Peter
> > >

Follow-Ups:
- Re: [office-collab] Position XML
  - From: Svante Schubert <svante.schubert@gmail.com>

References:
- Position XML
  - From: Peter Rakyta <rakyta.peter@multiracio.hu>
- Re: [office-collab] Position XML
  - From: Svante Schubert <svante.schubert@gmail.com>