OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

dita message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: Spec Clarification Issue: Characters Allowed in Key and Key Scope Names


I certainly agree that more precision is preferred. I would vote for NMToken plus additional characters (if we really need them) for key and keyscope names.

Thanks and best regards,

--Scott

Scott Hudson
Manager, Technical Writing
Product Training & Documentation
Customer Solutions

Jeppesen  |  Digital Aviation  |  Boeing
55 Inverness Drive East | Englewood, CO 80112 | www.jeppesen.com

-----Original Message-----
From: dita@lists.oasis-open.org [mailto:dita@lists.oasis-open.org] On Behalf Of Eliot Kimber
Sent: Friday, February 26, 2016 7:57 AM
To: dita
Subject: [dita] Spec Clarification Issue: Characters Allowed in Key and Key Scope Names

Jarno Elovirta has raised the question of what specific characters are actually allowed by the DITA specification for key names (and by extension, key scope names)?

The DITA 1.2 specification says:

* Key names consist of characters that are legal in a URI. The case of key names is significant.
* The following characters are prohibited in key names: "{", "}", "[", "]", "/", "#", "?", and whitespace characters.



This statement is unchanged in DITA 1.3 (but moved to the reference entry for the @keys attribute)


The problem here is that "characters that are legal in a URI" is not as precise as perhaps we thought it was.

In particular, by "legal" do we mean by characters that are allowed in the URI *string* before the URI is processed to resolve any escaped non-ASCII characters or do we mean any character that may be used in a URI, including characters that must be escaped in the ASCII encoding of a URI?
I suspect we intended the latter meaning but Jarno has interpreted it as the former, more-restrictive meaning.

There is definitely value in allowing a wide range of characters as keys, e.g., accented characters, characters from Asian and Middle Eastern writing systems, etc.

The primary practical concern is string matching--processors have to be able to reliably compare two key names to determine if they are or are not the same. When you allow non-ASCII characters generally you run into issues around how some characters might be composed when those characters can be composed in several different ways per the Unicode spec, e.g., characters that include or can be used with diacritical marks. The XPath specification has lots of language and infrastructure around this issue (and might provide a short path to a solution if we need one).

So we need to clarify what the rules for key names are and publish that clarification in some appropriate way.

A good option might be to use the XML NMTOKEN definition as the basis for key names, as that already allows pretty much every useful Unicode character and disallows characters we already don't want (https://urldefense.proofpoint.com/v2/url?u=https-3A__www.w3.org_TR_REC-2Dxml_-23sec-2Dcommon-2Dsyn&d=CwICAg&c=P3aKjizb3qsxp0SERaL2sw&r=YQWdLfM9mekBOdoMmoBdn9RgyqIHrveGolBbb4_uGWQ&m=_rlbYB4INZ8HkpWd7im5zQV2xAqEieTvQ4GCIXwws9k&s=6kdNUT6EAPEcgFv-lVVnVCSp2Shu8WdgHwYSOzkCFp0&e= ). The main problem I see with NMTOKEN is that it disallows characters that are not explicitly disallowed by the current definition and that are allowed by the conservative interpretation of "legal in a URI", for example, "@" and "="
are allowed for URIs but disallowed in NMTOKEN. So that could be a deal breaker.

Of course, we could define key in terms of NMTOKEN plus additional characters. 

The thing Jarno is asking for is a precise definition of the characters allowed, that is, an explicit list of characters and character ranges. It looks to me like NMTOKEN with additions is our fastest route to a precise definition. 

Cheers,

Eliot


----
Eliot Kimber, Owner
Contrext, LLC
https://urldefense.proofpoint.com/v2/url?u=http-3A__contrext.com&d=CwICAg&c=P3aKjizb3qsxp0SERaL2sw&r=YQWdLfM9mekBOdoMmoBdn9RgyqIHrveGolBbb4_uGWQ&m=_rlbYB4INZ8HkpWd7im5zQV2xAqEieTvQ4GCIXwws9k&s=xK0UDTLhPW_1-EpA-EnQjne0ruIgK8X822MqIML63mA&e= 




---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail.  Follow this link to all your TCs in OASIS at:
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-2Dopen.org_apps_org_workgroup_portal_my-5Fworkgroups.php&d=CwICAg&c=P3aKjizb3qsxp0SERaL2sw&r=YQWdLfM9mekBOdoMmoBdn9RgyqIHrveGolBbb4_uGWQ&m=_rlbYB4INZ8HkpWd7im5zQV2xAqEieTvQ4GCIXwws9k&s=tQyGjQ8M9vgtZHbp74O2zdOUvxcxq_7-KoZj-IbCEsM&e=  



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]