OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

sca-bindings message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Everything you never wanted to know about unicode, java symbols,and what we should use instead of "/" in generated WSDL


A bunch of points here:

1) Reality check - hoping to combine the "component name" with the service name into an NMTOKEN.  Can this even work without escaping characters.?
Yes, in principle. Both component name and service name are listed as "NCName" types.  "NC" in this case, stands for "no colon" names, and consists "NameChar"s repeated.  An NMTOKEN consists of NameChars, whereas a Name consists of NameChars - but only after the first character, that first character must be on a much shorter list.  So, provided you pick a valid joining string, you can just concatenate these strings together.

2) XML defines character classes - do these differ from Java?
Not for our purposes.  Appendix B of the XML specification notes that their Name start characters must be from Unicode character classes Ll, Lu, Lo, Lt, or Nl.  The Java method Character.isJavaIdentifierStart() checks for isLetter(), or getType() = Nl.  The major class difference here is that Java includes Lm (modifier letters), whereas XML NameChar does not.
Java appears to allow a variety of additional character classes after the first character, including class "Mn", and "Mc" (see Character.isJavaIdentifierPart()).

3) Are there specific differences we could exploit between Java and XML?
Yes.  Two characters allowed in XML NMTokens but not allowed in an unqualified Java name are ".", and "-".  Note: "_" (underscore) characters are allowed in both.

4) There is no particular requirement that we be able to "reverse engineer" from the generated WSDL to the invokable service(s) and their operations, at least that I'm aware of.  That is, in generating a WSDL, we are implying a generated a mapping to the the SCA constructs from the resultant WSDL.  All we want to do is facilitate uniqueness of a particular "definitions" name, or of a "binding" name.  Even then, it is not clear that any choice we make will actually prevent all possible theoretical collisions.  As such, any particular combining string we choose is really just for added readability.

Conclusion:
We should pick one of the following:
  • "-" (hyphen) -- the original suggestion
  • "." (period) - not going to appear in any unqualified Java class names.
  • "_" (underscore) - will not be distinct from Java names, thus prevents direct reverse-mappings in some cases.
  • "" (nothing) just concatenate the two names - no reverse mapping possible.
If the TC is still uncomfortable with the use of "-" (hyphen) as the "joining" character between two NCNames, we could instead use "."

(Note the above analysis does imply, for the Java TC folks, that they may need to account for an escaping algorithm that converts valid Java characters in a Java name into valid NCName characters.  Notably, "$" is not allowed in NCNames, and that is accessible even to English readers.  As noted, Java also allows for a variety of other character classes not allowed by XML.)

-Eric.



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]