[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [ubl-lcsc] XSD data types - was Re: [ubl-lcsc] Re: UBL 0.81 CCT draft-9-mod
On Mon, 8 Sep 2003, Tim McGrath wrote: >>thanks, i just wanted to know that you recognised the problem. your >>arguments add strength to the idea that it is very subjective about when >>we need to restrict content formats. I'm collecting more mod's to add to draft-9-mod-2. Would you want the attribute types for IdentifierType to be changed to "xsd:normalizedString" instead? That also brings the question of whether the same should be done to CodeType's attributes. But then CodeType's content type is xsd:token.... >>we need to consider the following... >> >>1. w.r.t. systems architectures. what happens if the data validation of >>the schema parser fails? typically, it is the logic of the processing >>application that knows how to deal with this - so why not let it do the >>validation. I agree that applications have to perform all the work about the validity and correctness of data. They face the end-user, and so have to assume any validation process that have not been already done on the lower layers. As a crude extension of this to a somewhat extreme end, if the lower layers did not provide some TCP network functions, the application would have to even build in time-outs, resends, CRC error checks, etc to simulate what TCP has already done at its level to transmit UBL files. It's also rather subjective as to whether application should all the work for syntax checking. Application has to do semantic checking for sure, but if the syntax *could* (not must) be pre-filtered by lower layers, namely the schema validator, then it could focus more on semantics. As a side benefit, schema validators themselves could be checked against published specs. So validations that are based on schema validators can enjoy some sort of agreed-upon syntax of how validations should be done. But from UBL's point of view, I think, we should probably look at what benefits there are in terms of precise data format specification and information interoperability (in terms of interpretation), rather than doing something for the purpose solely to assist applications, as applications would do what is necessary to "patch up" leaveages anyway. >>2. do we need to make these decisions up-front? could not specific >>implementors extend our schemas to add the facets necessary for this >>type of validation if they wanted it? True, I think most likely that would be the case. I asked myself whether do we expect users to be able to use CCT.xsd out-of-the-box. My answer to me is somewhat not a convincing yes. Take "NumericType" and "PercentType" for instance. They're both exactly the same! So the only contribution each gives to user is the naming, that embodies some semantics about the nature of the values. But users might not find, for example, PercentType immediately useful, since there's lack of specification as to whether a contained value of "0.30" means 0.30% or 30%. Similarly, NumericType is based on an abstract numerical space, such that physical representations of very large quantities, negative values, precision level, rounding modes, etc that may or may not apply in specific contexts are not taken cared of by NumericType alone. So if a user always has to extend from the types in CCT, they could also build their own local type directly basing from XSD's simple types. My take would be that each CCT type would have to be as specific for its purpose as possible, because the thoughts that go into making each type's specificity are then embedded into the type's design to ensure instance data used in the same context flow back and forth between writers and readers in the same expected shapes (data range) and sizes (bit lengths). >>3. if we accept this is a valid tactic then we should revisit the use of >>XSD dateTime as well. Or perhaps all of them :) >>Maybe we are trying to be too smart here. the critical requirement for >>UBL is for semantic clarity, not presentation of data. While I agree with you on the first half, I'm not sure if it should be done at the expense of data representation. Besides, it is sometimes not clear where the boundary is. For example, if a quantity is -1, is it: - a semantic error (e.g. we cannot permit a delivery of -1 piece of item), or - a syntax error (invalid representation, possibly due to programming bug), or - no error (oh, it is ok because sender is indicating that he's owing the recipient one piece of item)? I'd think suitably tight data representation can help in reducing semantic ambiguity. If in the same example, we have specified the type for quantity to be non-negative integers, then there's no semantic doubt that the -1 value is an error, a syntactic one in this case, and which is an easier thing to check once we have specified on the data representation. Best Regards, Chin Chee-Kai SoftML Tel: +65-6820-2979 Fax: +65-6743-7875 Email: cheekai@SoftML.Net http://SoftML.Net/
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]