November 19, 2013 XLIFF TC minutes

Hi all, the minutes do not fit into the kavi field again, so posting them on mailing list:

Thanks to Asanka for excellent minutes..

19/11/2013

Yves Asanka, DW, dF, Fredrik, Joachim, Kevin, Ray, Shirley,Tom, Uwe, Victor, Lucia, Helena, DavidO

dF:

We have 13 voters out of 14; the meeting has quorum;

Moving to approve last meeting minutes. Tom seconds. No objections.

We should try to resolve all material issues in this meeting if we want to have the third public review before the Christmas;

1st item: order of core module and extended elements, owned by Fredrik in the tracker; this should be address together with the schema ambiguity; we had some discussion between Tom and Yves saying that Schema [v1] is not expressive enough to discern between module element occurrences being explicitly allowed by the specification and module occurrences allowed by wild-card; Tom explained in the discussion that the issue is due to modules being optional; issue does not exist where the module exists next to required elements with given order; Fredrik did you make any proposal?

F: No. The most flexible approach would be to have all modules appear in any order before the standard elements and also allow third part extension to appear in .. any order before the required or the standard elements; right now the modules go before and in a strict order and the extension namespaces go after the standard elements; the first thing that makes a problem is that in order to put module information in the correct place you need to know where among all the other modules the content should go, which means you need to have some knowledge about modules that you don't support in order to meet the requirements of the schema which is bad, and that's all by not having a strict order but allowing modules appear in any order and same thing with the extension namespaces appearing after the standard elements means that you can't really develop a future module completely as an extension since it will go in a different place in the DOM than if it were an official module; and you would need to basically parse beyond all the standard elements to find third party namespaces, which makes streaming processing difficult. <presents example scenario>

dF: I think we have 2 conflicting requirements. 1) we are not able to use Schema to see whether a module is allowed explicitly or by wild-card 2) to be able to not change DOM position of an extension if promoted to a module; I think that the ambiguity issue could be resolved if the wild-card was always separated ; if the modules were always separated from the wild-card by required elements,

Tom: That's correct; people are using the schema now for development; we've taken the explicit module references out of the schema, so that the ambiguity does not exist; those elements are allowed with a wild-card so we can still do all of the same things as the schema is defined now, this is just a stop gap, not necessarily the solution. Having a required element between those explicit modules and the wild-cards solves the ambiguity issue; e.g. this is the way file is defined.

F: Would that be OK to just allow the wild-card before the required standard elements and simply not referring to the module's explicitly ...so the modules would be allowed by wild-card

T: that could lead to a different ambiguity with

any situation where we have an optional element in core that could be satisfied by the wild-card token; I'd have to check on that; but it may be possible;

F: That is definitely a problem since an element is allowed, that would include core element; so you'd basically ... an element not in the current namespace; not sure you can express that;

T: The only other thing that we could do which I think is putting the extension point inside a separate element, that would resolve the ambiguity that would allow a required element nesting the wild-card elements;

dF: I believe that the technical nature of this is clear; I think the options basically are either you separate wild-cards and modules by core so that it is unambiguous, another thing would be to change the schema language to go for RELAX NG or add Schematron based validation for the explicit validation of modules; I believe it is very important to be able to discern between XLIFF defined elements and any element; are there any opinions except F and T?

Fredrik, are you very strongly opposed to the solution of separating wild-card from explicitly allowed modules to be able to stick with Schema [not changing schema language]?

F: No, I don't have a hard requirement, I am basically against 3rd party namespaces in XLIFF ... so why not make them harder to use; It's not the cleanest solution..

dF: Any other viable solutions? Tom, would you be able to add Schematron based device that would be able to discern between modules explicitly allowed and extensions by wild-cards

T: We could do that;

dF: It seems to me that the way of the least complexity and easiest to achieve would be to split the wild-cards from the modules; Tom, would you be able to come up with a solution?

T: Sure.

dF: Is there dissent?

...

OK, we have consensus on resolving this.

F: Just to clarify, the proposal is that all the modules are allowed in any order [a single choice group] before specified required elements, and all third party namespace extensions are allowed in any order after the required elements.

dF: This situation occurs on multiple levels;

F: I understand that; each tree level where we have extension points;

dF: I don't care so much if the modules go in a given order or not; Tom, what is simpler, from Schema point of view?

T: What we've done right now is explicit references to the modules out, 1) one approach that we could use is to simply document how we would propose that the particular modules be referenced; 2) to put some sort of wrapper elements, either around module references, or around the wild-card elements, 3) making sure there is a required element between

the module elements and the extensions; we could also take a look at RELAX NG, not sure it will give us a solutions and whether I've bandwidth to change the schema etc.

dF: It is not realistic to change the schema language now.

F: The main issue that was reported was that you could not implement ... core and .. modules without having the knowledge all other modules; ..the knowledge about what other modules exists are a part of core; if you support core, you must also know where in the schema all modules go; .. if we go with strict order for the modules, <noise> allowing any order is clearly much preferable;

dF: We can agree on having modules separated from wild-cards by required elements but being in any order, is that doable Tom?

T: It is doable;

F: It should be ... xsl-choice .. any number of

dF: Tom will follow up on the mailing list; Just confirming (clarifying) that we have consensus to separate modules from wild card and have modules in any order. Moving to item 2. internal and external references; issues around ID scope, ID uniqueness scope, reference being able to point to elements with IDs, there was a discussion again between Yves and myself; I tried to split this one into subissues on the mailing list; I am trying to summarise what seems to be the baseline consensus that is the baseline for building the solutions; it seems that one of the possible ways on how to resolve the fragment identifying and referencing would be to have 2 separate ID scopes in XLIFF; one at each unit level; nothing core in the unit can compete for IDs; same situation repeat one level higher; each file has has ID scope down-to unit, units and groups share the same ID scope; to be able to reference externally we need unique identifier for file; latest proposal that seem to be acceptable for Yves and I is.. having actually the file IDs as UUIDs so it's possible to allow for a grouping operations and so on;

Y: That is one of the options; not the perfect option in many areas; <gives example scenario>

dF: Thisis just in case of external identifying; internal identifying will just work fine with hash plus IDs..

Y: No. you should be able to refer relative paths from within the file except if you decide that any relative path is to be just within the scope of that specific file; that's very restrictive again.

dF: We haven't had anything else ever;

Y: Because we never solved that problem in XLIFF 1.2

dF: Why not stick to it?

Y: Because this time we use URI or IRI value for reference for definition; so marker has to refer to something; <example scenario>

dF: if we say that it always references locally, we can say that

Y: Well, the fragment .. it is the secondary resource, when you define a URI we have the primary URI resource which is before a fragment, and the secondary resource within the primary resource but it does not mean... except if we define it that way..

dF: I think we can define it like that; you said at the beginning that we can specify our own fragment identifying

Y: I never said that; I said it is complex; I am not sure of the best solution;

F: What is the problem we are trying to solve it is trying to be able to refer to a file node in XLIFF 2 file without knowing the file name of the file?

Y: I think the main problem is how do we use URIs and fragment identifiers with XLIFF 2.0 ,

F: In that case if we have a 3rd level of an ID uniqueness at the document level for each file level just required to have unique ID within the XLIFF document and then you have your ID scope inside file up to unit then you have unit scope, ... I don't see why the UUID is more useful than NMTOKEN

Y: No.. it just needs to be unique within the document. Basically the problem is that if you have two files that if you merge together during the process you may end up with the same ID on the file, this is because we assume we don't change the file ID, if there is a process requirement that says you should look at it and actually you may be able to change the file ID then that problem goes away; and the extractors should still be able to use the original URI to identify exactly the file without relying on the IDs; so the ID would be used for internal referencing within the document that would be kind of dynamic and the origin would be used by the extractor .. what exactly is the file

F: It makes a problem when using absolute URIs within content in the file actually; in that case UUIDs is a better solution

Y: That's true, when we say we'd change the scope of everything within a unit as the same scope for the IDs, this is a massive change <gives example scenario>;

F: I think we already have that requirement; inline IDs are unique within the unit

Y: Yes, but if you have IDs at the segment .. you have IDs for other things may be, same thing for data; not sure what you mean by when you say everything in the unit as unique ID, because to me currently they don't; I agree within the content; and even the segment with the ID of the inline, but there are other objects with the IDs

dF: I meant content; data are defined with their own separate uniqueness scope;

Y: That means there is several groups of scopes there; and using the mechanism providing just levels is not going to work; <example scenario>

dF: The elements where you have these issue are data and note, are there anything else? I don't think so.

Y: If I look at the current definition of ID, and this is the current state; we have different values for group, for unit, for segment and ... we said basically that the inline and segment could be the same scope; and group and unit should be the same scope; there is nothing on notes; it should be there.

dF: I am not against to include data and notes to one of this scope to make things simpler;

Y: But it will make so complicated when you create or modify things..; because you have to know everything about the unit and every single ID in the unit.

dF: We could have two content scopes, like the file scope and the unit scope, and then notes and data would have their own scope together?

Y: There is another solution, you could possibly prefix the ID with something that tells what the type of scope you are currently looking at; may be just the level of the file; and within the file you'd exactly know where to go;

dF: so basically the prefix would be f-file d-data u-, for example

Y: It is not the prefix of ID, I mean prefix of the URI

dF: I know, we are free to define that

Y: I have never seen something as complex as that one for fragment URIs though;

dF: Can we agree that we have the three scopes, document, file, unit?

Y: We can agree we have scopes, but we don't agree yet on which ones are there

dF: Sure, there is the unit scope; this is relatively stable.

Y: Well, depending on what you mean by unit scope; do you include data or do you include notes; again that is not defined yet;

dF: Will you be opposed to having also the data and notes included in the units scope?

Y: To me that seems complicated to have that; it forces you to really do good handle of all the IDs of the document and object

dF: If you don't do that we are forced into prefixes and the identification mechanism grows too complex..

Y: I don't know what the implementation implications are;

F: I think if we already require somebody to support the full core - it is not that big an implication; .. some support for all the core elements and you collect all the IDs you've seen; if we expect people to implement a sub-

set of core; then obviously this is....

dF: sub-sets can be defined by processes and agents; if you are a modifier, you need to know about everything; if you are an extractor, your position is simpler, because you are creating everything;

F: <example scenario>; I would need to know all inline mark up ids, to generate new comment IDs etc.

dF: I would summarise what seems to be viable on common ground: unit content is 1 scope, file unit group is one group; data and notes seems to be separate scope for now; continue on email discussion.

Moving to third item.

There was discussion between Y and dF, options called solution 0) and solution C), just before this meeting it appears we have Ca) Yves's proposal to have "firstNo", the 3rd value, always obligatory; and

Cb) the third value would not be obligatory if there was only one non-reorderable scope.. that was meant to simplify the situation for extractors..; are there any opinions on these options?

F: I think that C looks quite well worked out; <noise/voice not clear>

dF: Would people from MS / Oracle / IBM have an opinion if they want the third value for their extractors?

Y: As soon as you are going to have someone supporting that, he is going to do that using an extension;

dF: As we don't hear any other opinions, I suggest Ca) is the consensus solution; don't hear any dissent; Yves, would you be willing to implement this in the spec?

Y: I can try;

dF: Thank you.

Moving to the fourth item.

This was pretty much resolved; I think F,Y, dF agreed that removing translate from segment simplifies both the translatability state algorithm and the re-segmentation PRs; unless I hear any dissent I would consider the translatability solution resolved and I can implement that and point to it once it is printed out;

Moving to fifth item. Yves said he encountered number of issues on re-segmentation PRs; Y, could you give us an idea?

Y: I've tried to implement it and I am running to many issues; I've managed to implement the splitting but not joining yet; and I don't think it's appropriate for me to comment till I have all the comments ready; I didn't have time to go through this problem.

dF: Part of those might have gone with the translatability;

Y: Possibly, some, yes. There is I think almost 30 different processing requirements listed; there are lots of things said over over again etc.

I started to try to lay that out.

dF: do you have a feeling if this is material or clarifications?

Y: Not sure.

V: Question about translatability issue; I don't understand exactly what is the translatability state solution.

dF: We discussed this last time; the defaults are inherited from structural elements; we agreed on an algorithm how local markers sm and em will override the inherited values; later on, we figured that the same expressiveness will be there if translate ("yes" or "no") won't be allowed on segments; this removed complexity of algorithms; the resolution has not been implemented; I'd implement this after the meeting and point to it on the public mailing list;

V: If you have the same expressivity, that's good..

dF: It kills about 8 re-segmentation PRs in the re-segmentation sections; you don't need to bother about the translate yes or no on segment..

Yves, would you be able to send your reservation against PRs ?

Y: I can try to do that before the next meeting;

dF: We've been effective; I will try to prepare a committee draft for next meeting, obviously I can only do it if the fifth item is resolved in the meantime.

meeting adjourned.

Dr. David Filip

=======================

LRC | CNGL | LT-Web | CSIS

University of Limerick, Ireland

telephone: +353-6120-2781

cellphone: +353-86-0222-158

facsimile: +353-6120-2734

http://www.cngl.ie/profile/?i=452

mailto: david.filip@ul.ie

xliff message