[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: Re: [xtm-wg] An XTM test suite
* Steven R. Newcomb | | I don't understand how it can be simpler than the existing XTM | syntax. It has to be since: - there will be no mergeMap element (they have all been merged in) - subjectIdentity will have no topicRef child (would have caused a merge) - instanceOf, scope, parameters and roleSpec will not have resourceRef or subjectIndicatorRef children (will be replaced by topicRef elements) There are probably more simplifications that I have forgotten right now. | It looks to me as though it must have more element types (such as | ones that make topic namespaces redundantly explicit), What are topic namespaces? | and that the element types that do correspond (in some sense) to XTM | element types will necessarily have different semantics, as well. | | For example, the Conceptual Model clearly establishes that, under | the covers, an occurrence is really a topic-occurrence association. | What does this mean for the "canonical output" form? I believe that it should have no consequences. Representing the relationships between topics and their occurrences as associations is not required by the specification, and so there is no need to test if processors actually do this. | I believe that we must output a topic-occurrence association (note | that I did *not* say <association>, I said "association"). Why? * Lars Marius Garshol | | This is close to it, yes. The idea is that in this syntax, any two | topic maps that are logically equivalent will have the exact same | serialized representation. * Steven R. Newcomb | | It's a good idea, if we can make it work. I'm glad to hear you think so. :-) * Lars Marius Garshol | | A canonical XTM document must | | - be UTF-8-encoded * Steven R. Newcomb | | Why this particular encoding? What does character encoding have to | do with it, as long as the mappings between character encodings are | unambiguous and explicit? Because the canonical format is easier to use if the output is guaranteed to be byte-by-byte identical. UTF-8 is the perfect choice for this, since it can represent all Unicode characters directly and since it is readable even with tools that are not Unicode-aware. * Lars Marius Garshol | | - have all elements (topic, association, baseName, topicRef etc) in | a specific order, probably based on the lexical order of IDs and | names * Steven R. Newcomb | | I don't see how this can work, unless we want to straitjacket the | order in which <topicMap> elements and their contents are scanned | and processed, and force all applications to keep a record of that | order, even though that order has no significance. The idea is not to reproduce the original input order, but to impose _a_ specific order. If there is no specified order there is no hope that the output from different processors will be identical, either. | This is a very unappealing prospect: to require applications to keep | track of nonsignificant information, incurring significant overhead | just so their conformance to the Spec can be verified. I agree. It has to be a goal for the canonical format to avoid this. | The unique identifiers (IDs) of elements found in the | content of <topicMap> elements cannot serve as the | basis for imposing a canonical order, either. | | * First of all, many (perhaps most?) of the elements | that demand the existence of topics in the | application-internal representation are #IMPLIED, so | we won't have IDs for all of them. What do we do | with the ones that don't have IDs? | | * Secondly, when we're merging multiple XTM documents, | the IDs of the elements aren't necessarily unique. | What do we do when two topics have the same ID? Good points. That means two things: that we can't use IDs, and that the canonical spec must specify how to assign IDs to all topics. * Lars Marius Garshol | | - have all attributes in a specific order (and | possibly conform to the canonical XML specification) * Steven R. Newcomb | | OK. (Why only "possibly"? Making everything totally | deterministic is the whole point of this exercise.) I say possibly because I haven't really thought it through. The spec would either have to say that all canonical XTMs must conform to the canonical XML spec, or leave it out entirely. * Lars Marius Garshol | | - have only normalized URIs * Steven R. Newcomb | | What constitutes "normalization" of URIs? I think we will have to specify it, but at least: - case normalization of scheme and host names - removal of default port numbers - deterministic %-escaping and absolutization | We must not create a conformance requirement that prevents | application builders from competing on the basis of the amount of | intelligence that is brought to bear on the question of whether two | URIs actually refer, ultimately, to one and the same resource. I agree that this should be a goal, and I think it is achievable. It may mean, however, that test cases will have to be constructed in such a way as to not cause such extra intelligence to cause extra merges that would otherwise not happen. | One way to handle this is to support a user's ability to "dumb down" | the URI-comparison processing to some specified level, just for | purposes of outputting a canonical form simply for establishing | conformance to the Spec in all other Spec-required respects. I thought about that, too, but producing test cases that do not make this an issue may be easier to achieve. In this case the input is to some extent controlled, which makes things somewhat easier. | This remark leads me to believe that you are thinking in terms of | using some version of the XTM syntax as the canonical output syntax, | as if XTM syntax were somehow the same thing as this canonical | output idea. Well, yes, that was the idea. It seemed natural since we already have a serialization syntax for these constructs to build on it and modify no more than necessary. | * It would be very bad if there were any confusion | whatsoever about whether a particular XML element or | document is expressed in XTM syntax or in our | canonical output syntax. The best way to avoid such | confusion is to avoid having element type names in | common between the two syntaxes. Hmmm. This could be achieved by using different namespaces, I guess. | * Having element type names in common will greatly | diminish our (the XTM Authoring Group's) ability to | communicate clearly and unambiguously among | ourselves. When we say "<topic>", we really must be | disciplined in meaning only what that string | (<topic>) means at input time, because the | corresponding construct that appears in canonical | output is not exactly the same kind of thing (for one | example of why this is true, see the discussion of | topic-occurrence associations, above). If we don't | establish these distinctions in our discussions, we | will misunderstand each other, and our productivity | as a group will be diminished. I find it difficult to imagine any possibility for confusion here. Anyone saying <topic> and meaning the topic element in the canonical syntax will just have to make that clear from the context, since this will most likely be a rare occurrence. And even if confusion were to occur I don't see how it could become very serious. | * Having element type names in common will muddle our | thinking as individuals. We must not allow ourselves | to make unconscious assumptions about the nature of | processed topic map information. The structure of | the canonical output must reflect precisely the | abstract structure of the application-internal form | of topic map information, as it will be defined by | the Authoring Group. The syntactic structure of the | input documents is irrelevant, and pretending that it | is somehow relevant will only blind and confuse us. I strongly agree with everything you say in this point, except for the first sentence, and I don't really see how it is connected to the rest of your paragraph. How will using nearly the same syntax for serialization and canonicalization muddle our thinking? * Lars Marius Garshol | | This I don't follow. You seem to imply here that something more than | what I propose above is needed. My problem is that I have a release | schedule to meet and must act very quickly indeed. So if something | radically more complex is needed I would prefer to do this first, | and then that as a second stage. * Steven R. Newcomb | | OK. In order to walk in a particular direction, we must move by | steps. I would only ask that each of us tries to be objective about | technical decisions. That means trying not to make technical | decisions on the basis of our own individual business objectives, | but rather on the basis of how best to develop the industry as a | whole. The only thing that competitors can be expected to agree | about is how to make the industry grow (and even that much is a | minor miracle). I hope there won't be too many conflicts among us, | and that the resolution of the conflicts can be navigated in a way | that doesn't bruise anyone economically. Taking well-considered | steps *together* is a good way to do that. I agree with all of this. It was silly of me to raise the subject at all. Please forget it. | BTW, I'm voting "Yes" on XTM 1.0, although I have grave misgivings | about Annex F, which I find misleading -- not so much by what it | says, but by what it doesn't say. What is it you feel it should say that it does not? Of course, it lacks an object model and so necessarily is only a shadow of what it ought to be, but given that it is acceptable, I think. --Lars M. ------------------------ Yahoo! Groups Sponsor ---------------------~-~> eGroups is now Yahoo! Groups Click here for more details http://click.egroups.com/1/11231/0/_/337252/_/982492589/ ---------------------------------------------------------------------_-> To Post a message, send it to: xtm-wg@eGroups.com To Unsubscribe, send a blank message to: xtm-wg-unsubscribe@eGroups.com
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC