OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

xliff message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Re: [xliff-inline] Code (and content) comparison

Hi Yves,

See my comments below. Nothing earth-shattering in them, but maybe worth the few minutes to read them.

> Hi all,
> This is a topic for the inline SC, but it may be of interest to the TC as a whole, so I'm CCing the TC list.
> I think one of the items we have to provide for inline codes is a description of when two codes are identical, and by extension when two contents are identical, at least in the context of the specification.

I think such a description would be very useful. The scenarios you describe could be very important, especially to someone trying to diagnose why something won't work right would going back from XLIFF. It would be very easy to mess up a tag in a way that is difficult to spot, and having a way to check it would be invaluable.

> Why?
> -- Elsewhere in the specification we may have to make reference to having two entries identical. For example when working on <match> we probably will have some type of score attribute, and we will want to define what "100" (or whatever value we use for an exact match) means. And that will likely entails saying that two source contents are identical.
> -- At some point we may have some tool to check processing expectations. Such tools, for example, could compare a 'before' and an 'after' document and flag any un-expected changes (See the interesting presentation from Andrew Pimlott for some scenarios). That means we'll have to compare contents, including codes.

I think that such a function would be vital in some cases.

> To get things started, here are some thoughts in no specific order: 
> - We may want to define different kinds of 'identical'. For example one that exclude the original content and one that takes it into account. No specific use case in mind yet thought.

Agreed. Both are probably useful for different reasons.

> - For the text part of the content. Are there official XML way to do compare? Should we involve normalized forms (e.g. NFD or NFC, http://unicode.org/reports/tr15/) when doing our definitions?

This can get tricky. In general, I think it is probably best to flag differences in normalization in some fashion because they could, potentially, have a downstream impact. For example, if you're localizing content for a database and you normalize something that shouldn't, it could result in an error that is very hard to diagnose based on the appearance of things. Some systems might also expect one form or the other, and seeing the difference can be very difficult without a compare utility. Perhaps a tools should have an option to flag these things or not: I can see many cases where users simply wouldn't care because it wouldn't matter.

> - It could be useful to provide (in some annex) the different regular expressions that match the inline codes. That could save time to users and ensure better tools or better process. Could be used in SRX for example.

So you mean the encapsulating markup as defined in XLIFF or the actual inline codes that are to be encapsulated? Obviously the latter would be rather difficult (to put it mildly), and I presume you mean the former, but just want to make sure.

> - How should we define the comparison(s)? A list of 'parts' of the code (attributes, content, etc.) and whether or not they need to be the same to match one definition of identical?

That would depend on the purpose of the comparison. I don't think there is a single best answer to this question.

> Any and all input welcome.
> Cheers,
> -yves


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]