OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

office-collab message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Re: Documents being equal (from a user perspective level)

Patrick gave on our last talk some good comments on this topic, I would like to add to the list:
He stated that the edit distance known for text files can naturally also be addressed to markup languages.
In general, when abstracting to a logical model as we do for collaboration we end up with a graph and could use a graph edit distance.
But we should keep in mind that it the focus should be on semantics. It doesn't matter if there where spaces exchanged against tabs, when only a single number is been changed in an invoice it could be a huge difference.


2016-11-01 15:09 GMT+01:00 Svante Schubert <svante.schubert@gmail.com>:
Dear SC,

If users receives two or more ODT - e.g. via email - how can they tell if those document are equal (from a user perspective)? 

We know that there are more ODT documents equal, which are not binary identical. For instance, if two ODT would only use different automatic style names the result should be that the documents are equal from user perspective.
Automatic style names are an implementation detail, still the result of an text or XML based comparison would be the documents would be different.
On automatic style names in our ODF 1.2 spec part 1 it reads in the last paragraph of 19.498.2: "Note: If the document is produced multiple times, it cannot be assumed that the same name is generated each time."

How does a new added digital signature change a document?
Is it a different document? I am yet undecided, how to categorize the different type of changes. I would start with the following:
  1. Binary equal
  2. Logical / Equal - Different automatic style names, nested spans, etc.
  3. Logical / Equal, but different names for bookmarks/IDs/references.
    If the internal ID within the ZIP aren't changed in parallel to an internal reference to it, the document has changed and is different (group 4 below)
    1. Different xml:id 
    2. Different Bookmarks & References
    3. Different names of files & directories in the ZIP (might break external differences)
  4. (Logical) Different
In the end, I would like to have a function that compares two documents and tells me on what level they are equal.
    Jos suggested to ease the problem by normalization, still the items to be normalized have to be listed. 

    Does it make sense to you to collect such a list of item being normalized on OASIS level?

    I would love to discuss this further on tomorrow's call!

    Kind regards,

    [Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]