[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: RE: [xliff] some comments on xliff names
Hi Eric, Thanks for putting together the comments on XLIFF names. If I understood correctly, there is some overlap between your comments, and some of questions that have already been raised (see snippets from the mail I posted on 25 Jan 2002, and information I got from Yves). Thus, I try to address some of your notes by references to that document, and I hope that the rest of the group will pitch in for the rest. Best, Christian -----Original Message----- From: Friedman, Eric [mailto:eric@eTranslate.com] Sent: Dienstag, 19. Februar 2002 19:47 To: xliff@lists.oasis-open.org Subject: [xliff] some comments on xliff names While working through the XLIFF DTD, I noticed some general problems that I think we should address as soon as possible. This is not a comprehensive list of all problems; rather it's a summary of some problems for which >1 instance can be found. 1. inconsistent hyphenation: why is content-type hyphenated but datatype is not? CL> 24:<!-- R: naming conventions: for multiwords sometimes mix of uppercase CL> and lowercase (eg. CodeContent), sometimes hyphenated form (eg. source-language), CL> sometimes all lowercase (eg. minheigth) --> 2. inconsistent use of generic and qualified names: several elements have generic "name" attributes; others have qualified names, such as "phase-name" 3. failure to exploit ID datatype for unique attribute values: a number of elements have "id" attributes which are documented as being unique identifiers, but the DTD assigns them either a CDATA or a NMTOKEN datatype instead of an ID type (which a validating parser can check to guarantee uniqueness). 4. failure to exploit IDREF datatype for references to unique IDs: a number of elements have attributes (like "phase-name") which are documented as references to other elements' unique identifiers, but the DTD assigns them either a CDATA or a NMTOKEN datatype instead of an IDREF type (which some XML toolkits will auto-magically resolve for the application program). Personal opinion: the function of attributes of type IDREF is easier to understand if "ref" is part of their name. For example, <abc id="unique"> [...] <xyz abc-ref="unique"> <!-- it's very clear that the abc-ref attribute referes to an instance of abc --> CL> From my understanding, the ID/IDREF might put unwanted restrictions CL> on the possible phase-names since it must match the XML name production CL> (ie. "Name ::= (Letter | '_' | ':') (NameChar)*"). YS> Furthermore, ID type attribute have to be unique, and because an XLIFF doc could join/split YS> several <file> elements, and we don't have a way to ensure ID are universaly YS> unique (not just within a <file>), it is possible to make a doc YS> invalid by joining two <file> elements in the same doc. I guess the issue YS> is that the advantages of using ID/IDREF are not stronger than the YS> disadvantages in not using it. YS> However, one may consider to make all id-type elements at least NMTOKEN. YS> CDATA, the current type, might be too 'loose' (it allows for example YS> spaces). 5. attribute names which are unclear, even in the context of their element. Example: the "file" element has an "original" attribute. It is not at all obvious that the value of original is supposed to be "the name of the original file from which the contents of the <file> have been extracted." Why not "original-name" or even "extracted-from" ? Similarly, the meaning of "category" is just as opaque unless you read the associated definition in the spec. CL> 137:<!-- Q: What's the semantics of attribute 'phase-name' for element CL> 'phase'? I am not really able to distinguish it from attribute 'process-name'. --> YS> I guess it is allow makeing a difference between two different passes YS> of the same process. For example 2 edits. 6. embedded "little languages": the "coord" attribute defines a little language to represent screen coordinates, including a special character for null values. Why foist this on the application programmer when XML can do the job for us with attributes like x-coord, y-coord, etc. ? 7. Ambiguous parts-of-speech in naming: the "clone" attribute has values "yes" or "no" There are (at least) three different ways to interpret its meaning: (a). Is it an imperative as in "yes, this should be cloned" ? (b). Is it a description of state as in "yes, this is a clone" ? (c). Or is it a description of an element's capabilities as in "yes, this element may be cloned" ? Reading the spec reveals that the answer is (c). Hence, a better name would be "cloneable" which cannot be interpreted as either (a) or (b). 8. terseness leading to confusion: "ctype" is unnecessarily opaque. Would "content-type" really be so onerous? CL> 70:<!-- R: I do not see a strong need for using an abbreviated name for CL> element 'skl' since compared to other elements we will not see many CL> occurrances of the element. Thus, using a long form like 'skeleton-file' CL> should not increase file size too much but would increase readability CL> (we already have elements 'internal-file' and 'external-file'). --> 9. redundancy in attribute names. The <mrk> element has an attribute "mtype" which specifies the type of the marker to which it belongs. Why is this not simply "type" ? Or, if you don't buy that, why isn't it "marker-type" in the same way that the <count> element has "count-type" ? -----Original Message----- From: Friedman, Eric [mailto:eric@eTranslate.com] Sent: Dienstag, 19. Februar 2002 19:47 To: xliff@lists.oasis-open.org Subject: [xliff] some comments on xliff names While working through the XLIFF DTD, I noticed some general problems that I think we should address as soon as possible. This is not a comprehensive list of all problems; rather it's a summary of some problems for which >1 instance can be found. 1. inconsistent hyphenation: why is content-type hyphenated but datatype is not? 2. inconsistent use of generic and qualified names: several elements have generic "name" attributes; others have qualified names, such as "phase-name" 3. failure to exploit ID datatype for unique attribute values: a number of elements have "id" attributes which are documented as being unique identifiers, but the DTD assigns them either a CDATA or a NMTOKEN datatype instead of an ID type (which a validating parser can check to guarantee uniqueness). 4. failure to exploit IDREF datatype for references to unique IDs: a number of elements have attributes (like "phase-name") which are documented as references to other elements' unique identifiers, but the DTD assigns them either a CDATA or a NMTOKEN datatype instead of an IDREF type (which some XML toolkits will auto-magically resolve for the application program). Personal opinion: the function of attributes of type IDREF is easier to understand if "ref" is part of their name. For example, <abc id="unique"> [...] <xyz abc-ref="unique"> <!-- it's very clear that the abc-ref attribute referes to an instance of abc --> 5. attribute names which are unclear, even in the context of their element. Example: the "file" element has an "original" attribute. It is not at all obvious that the value of original is supposed to be "the name of the original file from which the contents of the <file> have been extracted." Why not "original-name" or even "extracted-from" ? Similarly, the meaning of "category" is just as opaque unless you read the associated definition in the spec. 6. embedded "little languages": the "coord" attribute defines a little language to represent screen coordinates, including a special character for null values. Why foist this on the application programmer when XML can do the job for us with attributes like x-coord, y-coord, etc. ? 7. Ambiguous parts-of-speech in naming: the "clone" attribute has values "yes" or "no" There are (at least) three different ways to interpret its meaning: (a). Is it an imperative as in "yes, this should be cloned" ? (b). Is it a description of state as in "yes, this is a clone" ? (c). Or is it a description of an element's capabilities as in "yes, this element may be cloned" ? Reading the spec reveals that the answer is (c). Hence, a better name would be "cloneable" which cannot be interpreted as either (a) or (b). 8. terseness leading to confusion: "ctype" is unnecessarily opaque. Would "content-type" really be so onerous? 9. redundancy in attribute names. The <mrk> element has an attribute "mtype" which specifies the type of the marker to which it belongs. Why is this not simply "type" ? Or, if you don't buy that, why isn't it "marker-type" in the same way that the <count> element has "count-type" ? Eric ---------------------------------------------------------------- To subscribe or unsubscribe from this elist use the subscription manager: <http://lists.oasis-open.org/ob/adm.pl>
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC