[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: Language/Encoding tags for Definitions and Names
RegReper's, Last week we had a long discussion of "natural language" issues, and on Friday I proposed some changes to the OASIS registry/repository specification in order to allow "language" and "encoding" declarations for <definition-text> and <name-context>. It seems that I made that proposal prematurely, since I wasn't fully aware that XML has already defined defined "xml:lang" to be the item I was defining as "language" and has defined "EncodingDecl" and "EncName" to be what I was defining as "encoding". The relevant XML references are: http://www.w3.org/TR/REC-xml#sec-lang-tag http://www.w3.org/TR/REC-xml#charencoding http://www.w3.org/TR/REC-xml#charsets The XML specification defines "xml:lang" as an arbitrary attribute that can be referenced in any attribute declaration; however, I can't find a definition for how to reference "EncodingDecl" or "EncName" as an attribute. All of the examples in the XML specification assume that it is used as part of an XML element content. I'll use the term "xml:encoding" below to mean the use of "EncName" as an attribute. REQUIREMENTS Suppose a submitting organization submits a <data-element> to a registration authority for registration. What it is really doing is submitting metadata for some item it wants to be registered in the registry; the item itself is then available in some repository associated with the registry. Likely, the item itself is in a single language, but its definitions and some of the names it can be referenced by may be in different languages, possibly using different encoding schemes. Right now, a <data-element> must have exactly one <definition-text>, and by default that definition text is in the default language and encoding of the submission. We see a need for the submitting organization to be able to submit multiple definitions and multiple language names for the registered item, and for the registry to maintain these definitions under meaningful language tags. This is best done by allowing multiplicites greater than 1 for <definition-text> and by allowing pre-defined language, and encoding attributes, on <definition-text> and on <name-context>. Of course, the existing single instance of <definition-text> could be used to include multiple language definitions, each delimited by some XML language tag. But we see a need for these tags to be standardized as part of the structure of a <data-element> so that the registration authority can more easily maintain them and return the appropriate definition to a user of the repository. By keeping this information as optional attributes for <definition-text> and <name-context>, the registry doesn't have to try to parse <definition-text> to see what tags might be embedded inside of it. PROPOSAL 1) Allow multiplicities greater than 1 for <definition-text> In the file "data-element.dtd", in the definition of the element <data-element-concept>, replace "definition-text" by "definition-text+". 2) Allow language and encoding attributes on <defintion-text> In the file "data-element.dtd", just after the definition of the element <definition-text>, add the following attribute specification: <!ATTLIST definition-text xml:lang NMTOKEN #IMPLIED xml:encoding CDATA #IMPLIED > 3) Allow language and encoding attributes on <name-context> In the file "data-element.dtd", just after the definition of the element <name-context>, add the following attribute specification: <!ATTLIST definition-text xml:lang NMTOKEN #IMPLIED xml:encoding CDATA #IMPLIED > 4) Semantic Rules for "xml:lang" and "xml:encoding" Include the following semantic rules someplace in the OASIS RegRep specification: a) xml:lang is an attribute specified by Language Identifier, definitions [33] through [38] in W3C XML 1.0 (http://www.w3.org/TR/REC-xml#sec-lang-tag). b) xml:encoding is an attribute specified by the encoding name (EncName) of an Encoding Declaration, definition [81] in W3C XML 1.0 (http://www.w3.org/TR/REC-xml#charencoding). c) Various XML elements in a <data-element> may include an xml:encoding attribute that references a character encoding not supported by the registry to which it is submitted. In those cases, the XML element containing the unrecognized attribute, and the parent elements in which that element is required, may be ignored by the registry; this may invalidate the submission. ************************************************************** Len Gallagher LGallagher@nist.gov NIST Work: 301-975-3251 Bldg 820 Room 562 Home: 301-424-1928 Gaithersburg, MD 20899-8970 USA Fax: 301-948-6213 **************************************************************
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC