[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: URI in XLIFF2
Hi all, Here are some thoughts about the XLIFF 2.0 URIs to continue the discussion on that topic. We have IDs in <file>, <group>, <unit>, <segment>/<ignorable>, <pc>/<sc>/<mrk>/<sm>, <note>, and <data>. - The IDs of <file> must be unique within the document. But with the caveat that additional <file> can be added to the document later on. This led us to tentatively say that maybe the file's id should be a UUID. Note that it's not that easy to implement, for example I don't think XSLT has a way to create UUIDs. It would have to rely on third-party extension for this. That may be the case in other programming languages. - The IDs of <group> must be unique within the <file> - The IDs of <unit> must be unique within the <file>. Those may or may not share the same ID scope as the groups. - The IDs of <note> must be unique within the <file> (<note> can be at the <file>, <group> or <unit> level). So creating a new note means using a UUID or knowing all notes' IDs in that given file. - The IDs for <segment>/<ignorable> must be unique within the <unit> - The IDs for <pc>/<ph>/<sc>/<mrk>/<sm> must be unique with the <unit> (with the source/target usual caveat). We have a tentative agreement that those elements could share the <segment>/<ignorable> IDs scope (I'll refer to it as "segOrInlineIDs") - The IDs for <data> must be unique within the <originalData> A few additional constraints: - Our "segOrInlineIDs" can be duplicated: one in the source the other in the target. The URI should be able to indicates which one it points to. - The Match module is bringing additional headache to the <unit>. A match has its own <source> and <target> and <data> elements. So we'll have to somehow distinguish them from the "main" ones. - Various modules (and obviously any extension) can use references as well. The Glossary is an example of this. Currently the definition of id for glossentry does not offer scope information (http://docs.oasis-open.org/xliff/xliff-core/v2.0/xliff-core-v2.0.html#gls_id). We'll have to resolve this somehow. === Using levels A first potential solution is the one David described: using a hierarchy of IDs with a separator. The separator is a separate question. It simply needs to be a character allowed in a URI but not allowed in an NMTOKEN. #, /, ~, etc. would work. We just have to pick one. I'll use / throughout this email. For example: #fileID/groupOrUnitID/segOrInlineID A first, and I think show-stopper, issue: The notes can appear at different levels so it's not really possible to use them in such hierarchy. === Using prefixes Another potential solution could be to use prefixes along with a more flexible hierarchy. For example: most IDs in the fragment would be represented like this: <prefixLetter>=IdValue f for files g for groups u for units n for notes d for data non-prefixed value would be the segOrInlineIDs For example: #f=550e8400-e29b-41d4-a716-446655440000/g=id1 -> the group id='id1' anywhere in the file id='550e8400-e29b-41d4-a716-446655440000' #f=550e8400-e29b-41d4-a716-446655440000/u=id1 -> the unit id='id1' anywhere in the file id='550e8400-e29b-41d4-a716-446655440000' #f=550e8400-e29b-41d4-a716-446655440000/n=id1 -> the note id='id1' anywhere in the file id='550e8400-e29b-41d4-a716-446655440000' #f=550e8400-e29b-41d4-a716-446655440000/u=u1/s1 -> the segment id='s1' in the unit id='u1' anywhere in the file id='550e8400-e29b-41d4-a716-446655440000' #f=550e8400-e29b-41d4-a716-446655440000/u=u1/m1 -> the annotation id='m1' in the unit id='u1' anywhere in the file id='550e8400-e29b-41d4-a716-446655440000' #f=550e8400-e29b-41d4-a716-446655440000/u=u1/1 -> the code id='1' in the unit id='u1' anywhere in the file id='550e8400-e29b-41d4-a716-446655440000' #f=550e8400-e29b-41d4-a716-446655440000/u=u1/d=d1 -> the data id='d1' in the unit id='u1' anywhere in the file id='550e8400-e29b-41d4-a716-446655440000' We could maybe resolve the source/target issue with an final '~s' and '~t' after the segorInlineID value. The ~ would allow to distinguish it from the ID value. For example: #f=550e8400-e29b-41d4-a716-446655440000/u=u1/m1~s -> the annotation id='m1' in the source content of the unit id='u1' anywhere in the file id='550e8400-e29b-41d4-a716-446655440000' #f=550e8400-e29b-41d4-a716-446655440000/u=u1/s1~t -> the target element in the segment id='s1' of the unit id='u1' anywhere in the file id='550e8400-e29b-41d4-a716-446655440000' You could even imagine this working for the unit: #f=550e8400-e29b-41d4-a716-446655440000/u=u1~t -> the whole target content of the unit id='u1' anywhere in the file id='550e8400-e29b-41d4-a716-446655440000'. Not sure if it's really useful or even needed, because it doesn't always correspond to a single physical element. Just a thought. So far it seems it would work. --- Now for relative fragment: We could imply any missing part by the location of the reference attribute. #n=n123 -> the note id='n123' in the current <file> #u=234/10~s -> the source inline code or segment/ignorable id='10' in the unit id='123' in the current file. We would have invalid values for the cases where the position of the reference attribute does not provide the proper context. For example: #10~s used at a group level would not be valid as there is no unit context. I think it would be relatively easy to implement for most applications. But the solution requires a relatively complex parsing of the fragment. Bryan will have to see if XSLT can support such mechanism. --- Modules Now there is the issue of the modules. A possible option is to require two things: - a) any ID in a module must be set using the attribute id in the module/extension namespace (An evensimpler alternative would be to require xml:id) - b) any ID value in a module to be a UUID We could then use a special prefix for it: #f=550e8400-e29b-41d4-a716-446655440000/m=47ab0064-d9d4-4ef9-9805-c3ad88f0bae6 -> the module/extended element id='47ab0064-d9d4-4ef9-9805-c3ad88f0bae6' anywhere in the file id=550e8400-e29b-41d4-a716-446655440000 This guaranties even the core can find such ID and it can be referenced uniquely within each file. That's not pretty and it comes with the issue of generating UUID for some programming languages. But I can't think of another solution so far. --- Matches The solution above still does not handle using <source>/<target>/<data> in matches. Technically you could have the same Ids used in the match elements and in the unit where these matches are. Still thinking... Cheers, -ys
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]