OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

regrep message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Subject: Language/Encoding tags for Definitions and Names



RegReper's,

Last week we had a long discussion of "natural language" issues, and on
Friday I proposed some changes to the OASIS registry/repository
specification in order to allow "language" and "encoding" declarations for
<definition-text> and <name-context>.

It seems that I made that proposal prematurely, since I wasn't fully aware
that XML has already defined defined "xml:lang" to be the item I was
defining as "language" and has defined "EncodingDecl" and "EncName" to be
what I was defining as "encoding".  The relevant XML references are:

  http://www.w3.org/TR/REC-xml#sec-lang-tag

  http://www.w3.org/TR/REC-xml#charencoding

  http://www.w3.org/TR/REC-xml#charsets

The XML specification defines "xml:lang" as an arbitrary attribute that can
be referenced in any attribute declaration; however, I can't find a
definition for how to reference "EncodingDecl" or "EncName" as an
attribute.  All of the examples in the XML specification assume that it is
used as part of an XML element content.  I'll use the term "xml:encoding"
below to mean the use of "EncName" as an attribute.

REQUIREMENTS

Suppose a submitting organization submits a <data-element> to a
registration authority for registration.  What it is really doing is
submitting metadata for some item it wants to be registered in the
registry; the item itself is then available in some repository associated
with the registry.  Likely, the item itself is in a single language, but
its definitions and some of the names it can be referenced by may be in
different languages, possibly using different encoding schemes.  Right now,
a <data-element> must have exactly one <definition-text>, and by default
that definition text is in the default language and encoding of the
submission.  We see a need for the submitting organization to be able to
submit multiple definitions and multiple language names for the registered
item, and for the registry to maintain these definitions under meaningful
language tags. This is best done by allowing multiplicites greater than 1
for <definition-text> and by allowing pre-defined language, and encoding
attributes, on <definition-text> and on <name-context>.

Of course, the existing single instance of <definition-text> could be used
to include multiple language definitions, each delimited by some XML
language tag.  But we see a need for these tags to be standardized as part
of the structure of a <data-element> so that the registration authority can
more easily maintain them and return the appropriate definition to a user
of the repository. By keeping this information as optional attributes for
<definition-text> and <name-context>, the registry doesn't have to try to
parse <definition-text> to see what tags might be embedded inside of it.


PROPOSAL

1) Allow multiplicities greater than 1 for <definition-text>

In the file "data-element.dtd", in the definition of the element
<data-element-concept>, replace "definition-text" by "definition-text+".


2) Allow language and encoding attributes on <defintion-text>

In the file "data-element.dtd", just after the definition of the element
<definition-text>, add the following attribute specification:

<!ATTLIST definition-text 
    xml:lang      NMTOKEN   #IMPLIED 
    xml:encoding  CDATA     #IMPLIED
>


3) Allow language and encoding attributes on <name-context>

In the file "data-element.dtd", just after the definition of the element 
<name-context>, add the following attribute specification:

<!ATTLIST definition-text 
    xml:lang      NMTOKEN   #IMPLIED 
    xml:encoding  CDATA     #IMPLIED
>


4) Semantic Rules for "xml:lang" and "xml:encoding"

Include the following semantic rules someplace in the OASIS RegRep
specification:

a) xml:lang is an attribute specified by Language Identifier, definitions
[33] through [38] in W3C XML 1.0 (http://www.w3.org/TR/REC-xml#sec-lang-tag).

b) xml:encoding is an attribute specified by the encoding name (EncName) of
an Encoding Declaration, definition [81] in W3C XML 1.0
(http://www.w3.org/TR/REC-xml#charencoding).

c) Various XML elements in a <data-element> may include an xml:encoding
attribute that references a character encoding not supported by the
registry to which it is submitted.  In those cases, the XML element
containing the unrecognized attribute, and the parent elements in which
that element is required, may be ignored by the registry; this may
invalidate the submission.


**************************************************************
Len Gallagher                             LGallagher@nist.gov
NIST                                      Work: 301-975-3251
Bldg 820  Room 562                        Home: 301-424-1928
Gaithersburg, MD 20899-8970 USA           Fax: 301-948-6213
**************************************************************


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Powered by eList eXpress LLC