[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: More thoughts on canonical representation / NC equivalence
NC can be considered as a "black box" that receives (URI,local) pair and returns Yes/No. Now consider how can we decide that two NCs are equal: One brute force approach is to feed all possible (URI,local) pairs to NCs and see if their answers matches. Of course, there are infinite number of (URI,local) pairs, so you can't do this. However, since every NC consists of finite number of basic "rules", if you select "probe points" nicely, then you *might* be able to tell the equality (or subset relationship or whatever) by checking the answers of probe points only. So now the question is how to find those nice probe points. Consider the following NC X: <choice ns="foo"> <name> a </name> <name> b </name> </choice> Only namespace URI "foo" and local names a and b are appeared. So it is highly unlikely that X accepts (html,p) while it rejects (html,b). "html" is just one of "other namespaces" for this NC. This gives you a hint: you need a special symbol that represents "other namespaces" and "other local names". This symbol is called MAGIC in jjc's algorithm. Algorithm "possibleNames" is the algorithm that computes probe points for the given NC. Probe points for the above X will be { (foo,a),(foo,b) } So if you have another NC Y and its probe points, then by merging probe points and checking both by merged points, you can tell the various relationship of two NCs. That's what jjc did. Now go back to probe points. Since they are just probe points, it is somewhat useless until you probe NC by using them. Then why not annotate the set by the result of probing? { (foo,a)=>OK,(foo,b)=>OK } name pairs not in this probe set are guaranteed to be "NG" (NC doesn't accept them). Sometimes, this annotation reveals that the probe points set is bigger than necessary. For example, <choice> <nsName ns="foo" /> <anyName/> </choice> Probing points: { (MAGIC,MAGIC), (foo,a) } Annotated PP: { (MAGIC,MAGIC)=>OK, (foo,MAGIC)=>OK } If (MAGIC,MAGIC) is OK and (foo,MAGIC) is OK, then there is no need to probe (foo,MAGIC) anymore. "foo" is just one of "other namespaces". { (MAGIC,MAGIC)=>OK } There are several other rules to make the set smaller. Now you see that annotated probe points are not just probe points -- rather, it is NC itself! This leads to the concept of "canonical representation". This annotated set looks like uniquely representing a NC, doesn't it. In fact, it does. One of the goodness of this canonical representation is that you can construct XML of name class from this canonical representation. Moreover, constructed XML is always simple. -- Kohsuke KAWAGUCHI +1 650 786 0721 Sun Microsystems kohsuke.kawaguchi@eng.sun.com
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC