OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

xliff message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: RE: [xliff] Code (and content) comparison

Hi Helena,


I left normalization aside because I was talking about inline elements and in most cases you cannot normalize the original inline markup.


We can consider normalization when comparing higher level elements , like <source> or <unit>. In those cases we need to consider if any “xml:space” attribute is present and has effect on the text.





Rodolfo M. Raya   <rmraya@maxprograms.com>

Maxprograms      http://www.maxprograms.com


From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Helena S Chapman
Sent: Thursday, October 06, 2011 11:26 PM
To: Rodolfo M. Raya
Cc: xliff@lists.oasis-open.org
Subject: RE: [xliff] Code (and content) comparison


Agreed for the most part. However, why not throw in an option to specify normalization mode when compare the values. My suggestion is to keep it as simple as possible for 80% of the use cases:

- default: no normalization. Everything compares at code point value.
- or NFC or NFD. (Quite bluntly, NFKD or NFKC do not make sense to me in the context of localization)

Anything beyond normalization (NFC/NFD) for comparison should not be considered (e.g. Unicode based collation) for being overly complicated and therefore not needed.

Best regards,

Helena Shih Chapman
Globalization Technologies and Architecture
+1-720-396-6323 or T/L 938-6323
Waltham, Massachusetts

From:        "Rodolfo M. Raya" <rmraya@maxprograms.com>
To:        <xliff@lists.oasis-open.org>
Date:        10/06/2011 06:08 PM
Subject:        RE: [xliff] Code (and content) comparison
Sent by:        <xliff@lists.oasis-open.org>

Hi Yves,

Assuming that nesting is not allowed in inline markup, I would say that two inline elements are identical if two conditions are met:

1) Both elements have the same attributes with the same values. Attribute order is not relevant.
2) The PCData content of both elements, if any, is identical.

Notice that I'm deliberately ignoring any XML comment, processing instruction and CDATA section that could be part of any of the elementes being compared.

If you need two levels of "identicality", you could say "fully identical" when 2 conditions are met and just "identical" when the first condition is met.

Hope this helps,
Rodolfo M. Raya   <rmraya@maxprograms.com>

> -----Original Message-----
> From: xliff@lists.oasis-open.org [
mailto:xliff@lists.oasis-open.org] On Behalf
> Of Yves Savourel
> Sent: Thursday, October 06, 2011 7:33 PM
> To: xliff-inline@lists.oasis-open.org
> Cc: xliff@lists.oasis-open.org
> Subject: [xliff] Code (and content) comparison
> Hi all,
> This is a topic for the inline SC, but it may be of interest to the TC as a whole,
> so I'm CCing the TC list.
> I think one of the items we have to provide for inline codes is a description of
> when two codes are identical, and by extension when two contents are
> identical, at least in the context of the specification.
> Why?
> -- Elsewhere in the specification we may have to make reference to having
> two entries identical. For example when working on <match> we probably
> will have some type of score attribute, and we will want to define what "100"
> (or whatever value we use for an exact match) means. And that will likely
> entails saying that two source contents are identical.
> -- At some point we may have some tool to check processing expectations.
> Such tools, for example, could compare a 'before' and an 'after' document
> and flag any un-expected changes (See the interesting presentation from
> Andrew Pimlott for some scenarios). That means we'll have to compare
> contents, including codes.
> To get things started, here are some thoughts in no specific order:
> - We may want to define different kinds of 'identical'. For example one that
> exclude the original content and one that takes it into account. No specific
> use case in mind yet thought.
> - For the text part of the content. Are there official XML way to do compare?
> Should we involve normalized forms (e.g. NFD or NFC,
http://unicode.org/reports/tr15/) when doing our definitions?
> - It could be useful to provide (in some annex) the different regular
> expressions that match the inline codes. That could save time to users and
> ensure better tools or better process. Could be used in SRX for example.
> - How should we define the comparison(s)? A list of 'parts' of the code
> (attributes, content, etc.) and whether or not they need to be the same to
> match one definition of identical?
> Any and all input welcome.
> Cheers,
> -yves
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xliff-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: xliff-help@lists.oasis-open.org

To unsubscribe, e-mail: xliff-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: xliff-help@lists.oasis-open.org

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]