xliff message

Subject: RE: [xliff] some comments on xliff names
From: "Lieske, Christian" <christian.lieske@sap.com>
To: xliff@lists.oasis-open.org
Date: Fri, 22 Feb 2002 16:35:48 +0100
Hi Eric,

Thanks for putting together the comments on XLIFF names.
If I understood correctly, there is some overlap between
your comments, and some of questions that have already
been raised (see snippets from the mail I posted on
25 Jan 2002, and information I got from Yves). Thus,
I try to address some of your notes by references to that document,
and I hope that the rest of the group will pitch in for the rest.

Best, Christian

-----Original Message-----
From: Friedman, Eric [mailto:eric@eTranslate.com]
Sent: Dienstag, 19. Februar 2002 19:47
To: xliff@lists.oasis-open.org
Subject: [xliff] some comments on xliff names

While working through the XLIFF DTD, I noticed some general problems that I
think we should address as soon as possible.  This is not a comprehensive
list of all problems; rather it's a summary of some problems for which >1
instance can be found.

1. inconsistent hyphenation:  why is content-type hyphenated but datatype is
not?

CL> 24:<!-- R: naming conventions: for multiwords sometimes mix of uppercase

CL> and lowercase (eg. CodeContent), sometimes hyphenated form (eg.
source-language),
CL> sometimes all lowercase (eg. minheigth) -->

2. inconsistent use of generic and qualified names:  several elements have
generic "name" attributes; others have qualified names, such as "phase-name"

3. failure to exploit ID datatype for unique attribute values: a number of
elements have "id" attributes which are documented as being unique
identifiers, but the DTD assigns them either a CDATA or a NMTOKEN datatype
instead of an ID type (which a validating parser can check to guarantee
uniqueness).

4. failure to exploit IDREF datatype for references to unique IDs: a number
of elements have attributes (like "phase-name") which are documented as
references to other elements' unique identifiers, but the DTD assigns them
either a CDATA or a NMTOKEN datatype instead of an IDREF type (which some
XML toolkits will auto-magically resolve for the application program).
Personal opinion: the function of attributes of type IDREF is easier to
understand if "ref" is part of their name.  For example,
<abc id="unique">
[...]
<xyz abc-ref="unique">  <!-- it's very clear that the abc-ref attribute
referes to an instance of abc -->

CL> From my understanding, the ID/IDREF might put unwanted restrictions
CL> on the possible phase-names since it must match the XML name production
CL> (ie. "Name ::=  (Letter | '_' | ':') (NameChar)*").

YS> Furthermore, ID type attribute have to be unique, and because an XLIFF
doc could join/split
YS> several <file> elements, and we don't have a way to ensure ID are
universaly
YS> unique (not just within a <file>), it is possible to make a doc
YS> invalid by joining two <file> elements in the same doc. I guess the
issue
YS> is that the advantages of using ID/IDREF are not stronger than the
YS> disadvantages in not using it.
YS> However, one may consider to make all id-type elements at least NMTOKEN.
YS> CDATA, the current type, might be too 'loose' (it allows for example
YS> spaces).

5. attribute names which are unclear, even in the context of their element.
Example:  the "file" element has an "original" attribute.  It is not at all
obvious that the value of original is supposed to be "the name of the
original file from which the contents of the <file> have been extracted."
Why not "original-name"  or even "extracted-from" ?  Similarly, the meaning
of "category" is just as opaque unless you read the associated definition in
the spec.

CL> 137:<!-- Q: What's the semantics of attribute 'phase-name' for element
CL> 'phase'? I am not really able to distinguish it from attribute
'process-name'. -->

YS> I guess it is allow makeing a difference between two different passes
YS> of the same process. For example 2 edits.

6. embedded "little languages": the "coord" attribute defines a little
language to represent screen coordinates, including a special character for
null values.  Why foist this on the application programmer when XML can do
the job for us with attributes like x-coord, y-coord, etc. ?

7. Ambiguous parts-of-speech in naming:  the "clone" attribute has values
"yes" or "no"  There are (at least) three different ways to interpret its
meaning:

(a). Is it an imperative as in "yes, this should be cloned" ?
(b). Is it a description of state as in "yes, this is a clone" ?
(c). Or is it a description of an element's capabilities as in "yes, this
element may be cloned" ?

Reading the spec reveals that the answer is (c).  Hence, a better name would
be "cloneable" which cannot be interpreted as either (a) or (b).

8. terseness leading to confusion: "ctype" is unnecessarily opaque.  Would
"content-type" really be so onerous?

CL> 70:<!-- R: I do not see a strong need for using an abbreviated name for
CL> element 'skl' since compared to other elements we will not see many
CL> occurrances of the element. Thus, using a long form like 'skeleton-file'
CL> should not increase file size too much but would increase readability
CL> (we already have elements 'internal-file' and 'external-file'). -->

9. redundancy in attribute names.  The <mrk> element has an attribute
"mtype" which specifies the type of the marker to which it belongs.  Why is
this not simply "type" ?  Or, if you don't buy that, why isn't it
"marker-type" in the same way that the <count> element has "count-type" ?

-----Original Message-----
From: Friedman, Eric [mailto:eric@eTranslate.com]
Sent: Dienstag, 19. Februar 2002 19:47
To: xliff@lists.oasis-open.org
Subject: [xliff] some comments on xliff names


While working through the XLIFF DTD, I noticed some general problems that I
think we should address as soon as possible.  This is not a comprehensive
list of all problems; rather it's a summary of some problems for which >1
instance can be found.

1. inconsistent hyphenation:  why is content-type hyphenated but datatype is
not?

2. inconsistent use of generic and qualified names:  several elements have
generic "name" attributes; others have qualified names, such as "phase-name"

3. failure to exploit ID datatype for unique attribute values: a number of
elements have "id" attributes which are documented as being unique
identifiers, but the DTD assigns them either a CDATA or a NMTOKEN datatype
instead of an ID type (which a validating parser can check to guarantee
uniqueness). 

4. failure to exploit IDREF datatype for references to unique IDs: a number
of elements have attributes (like "phase-name") which are documented as
references to other elements' unique identifiers, but the DTD assigns them
either a CDATA or a NMTOKEN datatype instead of an IDREF type (which some
XML toolkits will auto-magically resolve for the application program).
Personal opinion: the function of attributes of type IDREF is easier to
understand if "ref" is part of their name.  For example, 

<abc id="unique">
[...]
<xyz abc-ref="unique">  <!-- it's very clear that the abc-ref attribute
referes to an instance of abc -->

5. attribute names which are unclear, even in the context of their element.
Example:  the "file" element has an "original" attribute.  It is not at all
obvious that the value of original is supposed to be "the name of the
original file from which the contents of the <file> have been extracted."
Why not "original-name"  or even "extracted-from" ?  Similarly, the meaning
of "category" is just as opaque unless you read the associated definition in
the spec.

6. embedded "little languages": the "coord" attribute defines a little
language to represent screen coordinates, including a special character for
null values.  Why foist this on the application programmer when XML can do
the job for us with attributes like x-coord, y-coord, etc. ?

7. Ambiguous parts-of-speech in naming:  the "clone" attribute has values
"yes" or "no"  There are (at least) three different ways to interpret its
meaning:

(a). Is it an imperative as in "yes, this should be cloned" ?  
(b). Is it a description of state as in "yes, this is a clone" ?  
(c). Or is it a description of an element's capabilities as in "yes, this
element may be cloned" ?  

Reading the spec reveals that the answer is (c).  Hence, a better name would
be "cloneable" which cannot be interpreted as either (a) or (b).

8. terseness leading to confusion: "ctype" is unnecessarily opaque.  Would
"content-type" really be so onerous?

9. redundancy in attribute names.  The <mrk> element has an attribute
"mtype" which specifies the type of the marker to which it belongs.  Why is
this not simply "type" ?  Or, if you don't buy that, why isn't it
"marker-type" in the same way that the <count> element has "count-type" ?

Eric

----------------------------------------------------------------
To subscribe or unsubscribe from this elist use the subscription
manager: <http://lists.oasis-open.org/ob/adm.pl>