Subject: Everything you never wanted to know about unicode, java symbols,and what we should use instead of "/" in generated WSDL
A bunch of points here:|
1) Reality check - hoping to combine the "component name" with the service name into an NMTOKEN. Can this even work without escaping characters.?
Yes, in principle. Both component name and service name are listed as "NCName" types. "NC" in this case, stands for "no colon" names, and consists "NameChar"s repeated. An NMTOKEN consists of NameChars, whereas a Name consists of NameChars - but only after the first character, that first character must be on a much shorter list. So, provided you pick a valid joining string, you can just concatenate these strings together.
2) XML defines character classes - do these differ from Java?
Not for our purposes. Appendix B of the XML specification notes that their Name start characters must be from Unicode character classes Ll, Lu, Lo, Lt, or Nl. The Java method Character.isJavaIdentifierStart() checks for isLetter(), or getType() = Nl. The major class difference here is that Java includes Lm (modifier letters), whereas XML NameChar does not.
Java appears to allow a variety of additional character classes after the first character, including class "Mn", and "Mc" (see Character.isJavaIdentifierPart()).
3) Are there specific differences we could exploit between Java and XML?
Yes. Two characters allowed in XML NMTokens but not allowed in an unqualified Java name are ".", and "-". Note: "_" (underscore) characters are allowed in both.
4) There is no particular requirement that we be able to "reverse engineer" from the generated WSDL to the invokable service(s) and their operations, at least that I'm aware of. That is, in generating a WSDL, we are implying a generated a mapping to the the SCA constructs from the resultant WSDL. All we want to do is facilitate uniqueness of a particular "definitions" name, or of a "binding" name. Even then, it is not clear that any choice we make will actually prevent all possible theoretical collisions. As such, any particular combining string we choose is really just for added readability.
We should pick one of the following:
(Note the above analysis does imply, for the Java TC folks, that they may need to account for an escaping algorithm that converts valid Java characters in a Java name into valid NCName characters. Notably, "$" is not allowed in NCNames, and that is accessible even to English readers. As noted, Java also allows for a variety of other character classes not allowed by XML.)