OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

dita-translation message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [dita-translation] Changes to documentation of xml:lang and translate attributes


Hi Kevin,

If the authoring tool is set up correctly for the languages the particular
project is to support, then the authoring tool can validate the xml:lang
attribute at authoring time. If the authoring tool is not configured for
this, then IMHO the organization doing the authoring is not using their XML
tools correctly. All modern XML editors can easily be set up to validate an
attribute value, and most also allow you to offer the author a list of
values to choose from. Either way, the authoring tool is providing
additional custom data validation beyond the rules of the DTD.

If you allow your authors to type in any xml:lang value that comes to mind,
then you'll get the mess you talk about. If the person responsible for the
tools sets the tools up correctly for the job at hand, you should never get
invalid attribute values, even where the DTD does not specify a specific
list of values.


Best Regards,
Gershon

---
Gershon L Joseph
Member, OASIS DITA and DocBook Technical Committees
Director of Technology and Single Sourcing
Tech-Tav Documentation Ltd.

-----Original Message-----
From: Farwell, Kevin [mailto:Kevin.Farwell@lionbridge.com] 
Sent: Friday, March 10, 2006 6:37 PM
To: Andrzej Zydron
Cc: gershon@tech-tav.com; Felix Sasaki; Robert D Anderson; bhertz@sdl.com;
Bryan Schnabel; Charles Pau; Lieske, Christian; Dave A Schell;
dita-translation@lists.oasis-open.org; dpooley@sdl.com; Richard Ishida;
Jennifer Linton; mambrose@sdl.com; patrickk@scriptware.nl;
pcarey@lexmark.com; Reynolds, Peter; rfletcher@sdl.com; Munshi, Sukumar;
tony.jewtushenko@productinnovator.com; Yves Savourel
Subject: RE: [dita-translation] Changes to documentation of xml:lang and
translate attributes

Hi,

I think we're talking about two different things. Every time I bring up a
problem of whether a valid XML file is correct, I am told my premise is
incorrect because the attribute values I suggest do not follow rules
external to the tools that validate the file. I'm not questioning the work
leading up to what constitutes a locale code in an XML file. I'm questioning
a plan that relies on authors to have read and understood that work. I agree
that the absurd locale codes I've offered are not correct, but I contend
once again that if you type them into the value of the xml:lang attribute
your file will be valid (and incorrect).

What I'm trying to get to is whether it's a good idea to have valid xml
files that can contain incorrect information and then rely on an output
process to find those incorrect values. I say it isn't. 

Should you think my suggestion that authors might use incorrect values is
not reasonable, I assure you I'm not just dreaming it up. Do a survey of
your clients or colleagues about various language or country codes.
Some will guess, and some will have such confidence in their guess they will
not consider looking it up. 

Kevin  

-----Original Message-----
From: Andrzej Zydron [mailto:azydron@xml-intl.com]
Sent: Friday, March 10, 2006 7:22 AM
To: Farwell, Kevin
Cc: gershon@tech-tav.com; Felix Sasaki; Robert D Anderson; bhertz@sdl.com;
Bryan Schnabel; Charles Pau; Lieske, Christian; Dave A Schell;
dita-translation@lists.oasis-open.org; dpooley@sdl.com; Richard Ishida;
Jennifer Linton; mambrose@sdl.com; patrickk@scriptware.nl;
pcarey@lexmark.com; Reynolds, Peter; rfletcher@sdl.com; Munshi, Sukumar;
tony.jewtushenko@productinnovator.com; Yves Savourel
Subject: Re: [dita-translation] Changes to documentation of xml:lang and
translate attributes

Hi Kevin,

The reason why we should not have a prescriptive list for xml:lang, is the
same one that inclined the W3C to do the same, or for that matter RFC 3066
itself. RFC 3066 is designed as an open ended notation based on ISO 3166 and
ISO 639. Why doesn't RFC 3066 provide a defined list of values? Because it
would be too restrictive. There is little point in trying to outsmart the
W3C or IANA. They have been through this process many, many times.

There is a clear distinction between what XML can do in terms of validation,
and what externally referenced standards can do. XML 1.0 specifies that the
value of xml:lang is governed by RFC 3066. Therefore
"english_for_the_united_kingdom" is not valid.

In the end you have to provide your own validation for xml:lang based on RFC
3066. This is an XML fact of life.

Best Regards,

AZ


Farwell, Kevin wrote:
> Hi,
> 
> Let me clarify. I never said the rules are not clear. I said the 
> method for enforcing the rules is not clear. While "en-uk" is not a 
> valid locale according to the rules, it is completely valid according 
> to the DTD. "english_for_the_united_kingdom" is also completely valid.

> My point is not to fix the rules but to apply them. Incidentally, the 
> list of allowed values in the DITA reference features lowercase 
> country codes, which violates the capitalization rule listed below, 
> for what it's worth.
> 
> Using secondary tools to determine whether XML is correct seems risky 
> to me. Validating a file, in that case, guarantees it's valid when 
> tested against a content model but not necessarily that it is valid 
> within a production environment. At very least, doesn't that create 
> the potential for a false sense of security? Must every file be 
> validated twice by two different methods? I would think that arriving 
> at a valid file should mean that the file can go through the rest of 
> the system with no further work.
> 
> Kevin
> 
> -----Original Message-----
> From: Andrzej Zydron [mailto:azydron@xml-intl.com]
> Sent: Wednesday, March 08, 2006 2:11 PM
> To: gershon@tech-tav.com
> Cc: Farwell, Kevin; 'Felix Sasaki'; 'Robert D Anderson'; 
> bhertz@sdl.com; 'Bryan Schnabel'; 'Charles Pau'; 'Lieske, Christian'; 
> 'Dave A Schell'; dita-translation@lists.oasis-open.org;
> dpooley@sdl.com; 'Richard Ishida'; 'Jennifer Linton'; 
> mambrose@sdl.com; patrickk@scriptware.nl; pcarey@lexmark.com; 
> Reynolds, Peter; rfletcher@sdl.com; Munshi, Sukumar;
tony.jewtushenko@productinnovator.com; 'Yves Savourel'
> Subject: Re: [dita-translation] Changes to documentation of xml:lang 
> and translate attributes
> 
> Hi Gershon,
> 
> I agree with you. There is little point in setting out a proscriptive 
> list. The implementation guidelines should state the contents of the 
> xml:lang attribute must follow the rules of Extensible Markup Language
> (XML) 1.0 (Third Edition) section 2.12, which mandates the use of IETF

> RFC 3066. There should not be a need for a full proscriptive list as 
> this would be too restrictive and inflexible. The rules for RFC 3066 
> are well defined and not as free form as Kevin's email suggests.
> 
> The value 'en-uk' is not a valid RFC 3066 value as per Kevin's example

> for two reasons:
> 
> 1) 'uk' is not a valid ISO 3166 country code. 'GB' is the ISO 3166 
> country code for Great Britain.
> 2) It is in lower case. Country codes must be in upper case.
> 
> I can see no benefit in trying to 'better' the XML standard itself. 
> The only weakness in RFC 3066 is the inability to add script 
> information to the locale as well as regional or variant settings.
> This is not going to be a problem for DITA in the near term. These 
> issues are being addressed in RFC 3066bis, which is still in draft 
> form. RFC 3066bis is backwards compatible with RFC 3066 and should not

> cause a problem for any DITA 1.1 implementation anyway.
> 
> Best Regards,
> 
> AZ
> 
> Gershon L Joseph wrote:
> 
>>If we hard-code it in the DTD, we'll have a hard time keeping the set 
>>of allowable values up-to-date. Also, I've yet to find an accurate 
>>fully up-to-date list of values on the Web that's not draft or 
>>incomplete. I think it should be up to the implementation to ensure 
>>the value entered is valid, or to offer the user a list of options 
>>customized to the user's needs. I suspect offering a list of about 100
> 
> 
>>values will confuse the user almost as much as leaving them to
> 
> research it themselves.
> 
>>I don't mind adding a link in the spec documentation to an accurate 
>>list that's always going to be kept updated. I have not found such a 
>>list (I'm sure it exists, but I could find anything valuable via
> 
> Google).
> 
>>What do others think?
>>
>>
>>Best Regards,
>>Gershon
>>
>>-----Original Message-----
>>From: Farwell, Kevin [mailto:Kevin.Farwell@lionbridge.com]
>>Sent: Wednesday, March 08, 2006 6:58 PM
>>To: gershon@tech-tav.com; Felix Sasaki; Robert D Anderson
>>Cc: bhertz@sdl.com; Bryan Schnabel; Charles Pau; Lieske, Christian; 
>>Dave A Schell; dita-translation@lists.oasis-open.org; dpooley@sdl.com;
> 
> 
>>Richard Ishida; Jennifer Linton; mambrose@sdl.com; 
>>patrickk@scriptware.nl; pcarey@lexmark.com; Reynolds, Peter; 
>>rfletcher@sdl.com; Munshi, Sukumar; 
>>tony.jewtushenko@productinnovator.com; Yves Savourel
>>Subject: RE: [dita-translation] Changes to documentation of xml:lang 
>>and translate attributes
>>
>>Hi,
>>
>>I have a question about the values of the xml:lang attribute. With 
>>phrases like "The allowed xml:lang values..." from the DITA reference 
>>and "This attribute must be set to a language identifier, as
> 
> defined..."
> 
>>from the email below, I don't understand why the values aren't set in 
>>the DTD and the users aren't given a list to pick from instead of a 
>>set of rules to follow. As an NMTOKEN, the value of the xml:lang 
>>attribute can be anything the user desires as still be valid. If 
>>something must be enforced, why leave it to users to enforce it? Why 
>>doesn't the content model enforce it?
>>
>>Confusion surrounding the locale codes is fairly easy to understand. 
>>The textual description runs country-language, but the symbol runs 
>>language-country. If a user is trying to remember the symbol for UK 
>>English, gb-en is as likely as en-gb, and even if they remember the 
>>country comes first, why wouldn't UK English be en-uk? Latvian is 
>>lv-lv, so why isn't Japanese ja-ja or jp-jp? If what's "allowed"
>>"must" be in the attribute value, why leave it to chance or leave it 
>>up to users doing research (which, in my opinion, are the same thing)?
>>
>>Kevin
>>
>>-----Original Message-----
>>From: Gershon L Joseph [mailto:gershon@tech-tav.com]
>>Sent: Wednesday, March 08, 2006 8:38 AM
>>To: 'Felix Sasaki'; 'Robert D Anderson'
>>Cc: bhertz@sdl.com; 'Bryan Schnabel'; 'Charles Pau'; 'Lieske, 
>>Christian'; 'Dave A Schell'; dita-translation@lists.oasis-open.org;
>>dpooley@sdl.com; 'Richard Ishida'; 'Jennifer Linton'; 
>>mambrose@sdl.com; patrickk@scriptware.nl; pcarey@lexmark.com; 
>>Reynolds, Peter; rfletcher@sdl.com; Munshi, Sukumar; 
>>tony.jewtushenko@productinnovator.com;
>>'Yves Savourel'
>>Subject: RE: [dita-translation] Changes to documentation of xml:lang 
>>and translate attributes
>>
>>Thank you all for your input. I'm replying to all comments in a single
> 
> 
>>email to make it easier to follow this thread and where we're going...
>>
>>Here are new proposals for the two attributes based on all the 
>>feedback I've received to-date, as well as our discussions during
> 
> Monday's SC meeting.
> 
>>My previous proposal kept the original descriptions in the current 
>>spec as much as possible, and I'm glad I received the reactions I did
> 
> (e.g.
> 
>>English being the default language -- I felt uneasy about that one
> 
> too).
> 
>>I took the default values from the spec, which I now see confused 
>>everyone; I've changed them to reflect their usage.
>>
>>PROPOSAL FOR translate ATTRIBUTE:
>>
>>Name: translate
>>
>>Description: Indicates whether the content of the element should be 
>>translated or not. The translate attribute setting applies to the 
>>element on which it is set, and is inherited by all child elements 
>>that do not specify the translate attribute. The translate attribute 
>>does not indicate whether attribute values of the element and its 
>>children should be translated; attribute values should never be 
>>translated. If this attribute is not specified on the document 
>>element, then processors must assume translate="yes".
>>
>>Data Type: yes | no
>>
>>Default Value: Not set
>>
>>Required: #IMPLIED
>>
>>
>>PROPOSAL FOR xml:lang ATTRIBUTE:
>>
>>Name: xml:lang
>>
>>Description: Specifies the language and locale of the element content.
>>The intent declared with xml:lang is considered to apply to all 
>>attributes and content of the element where it is specified, unless 
>>overridden with an instance of xml:lang on another element within that
> 
> 
>>content. When no xml:lang value is supplied, the processor should
> 
> assume a default value.
> 
>>This attribute must be set to a language identifier, as defined by 
>>IETF RFC
>>3066 (http://www.ietf.org/rfc/rfc3066.txt) or successor.
>>
>>Data Type: NMTOKEN
>>
>>Default Value: Not set
>>
>>Required: #IMPLIED
>>
>>
>>
>>----------------------------------------------------------------------
>>----------------------------- Text inserted by Panda Platinum 2005 
>>Internet Security:
>>
>> This message has NOT been classified as spam. If it is unsolicited 
>>mail (spam), click on the following link to reclassify it:
>>http://127.0.0.1:6083/Panda?ID=pav_47530&SPAM=true
>>----------------------------------------------------------------------
>>-----------------------------
>>
>>
> 
> 
> 


-- 


email - azydron@xml-intl.com
smail - c/o Mr. A.Zydron
	PO Box 2167
         Gerrards Cross
         Bucks SL9 8XF
	United Kingdom
Mobile +(44) 7966 477 181
FAX    +(44) 1753 480 465
www - http://www.xml-intl.com

This message contains confidential information and is intended only for the
individual named.  If you are not the named addressee you may not
disseminate, distribute or copy this e-mail.  Please notify the sender
immediately by e-mail if you have received this e-mail by mistake and delete
this e-mail from your system.
E-mail transmission cannot be guaranteed to be secure or error-free as
information could be intercepted, corrupted, lost, destroyed, arrive late or
incomplete, or contain viruses.  The sender therefore does not accept
liability for any errors or omissions in the contents of this message which
arise as a result of e-mail transmission.  If verification is required
please request a hard-copy version. Unless explicitly stated otherwise this
message is provided for informational purposes only and should not be
construed as a solicitation or offer.










[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]