[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [unitsml] Identity of units
I agree that this kind of issue needs to be resolved before 1.0 is released. Here are my thoughts. The work that was done on development of the InChI string by IUPAC has done a lot to show the importance of having a unique identifier for something that can be represented in a number of ways. Can we not do the same thing for units? I think so - consider the following as a representation of the Newton m[+01]kg[+01]s[-02]K[+00]A[+00]mol[+00]cd[+00] the format uses the symbols in a common order with square brackets around the power of that unit. You could certainly do a shorthand version of this by omitting the last four units but its not so bad having them present. Still, there is need for additional information in order to be able to distinguish between similar units, by adding conversion factors in exponent form. m[+01]kg[+00]s[-01] (m/s) m[+01]kg[+00](6.0E+01)s[-01] (m/s) m[+01]kg[+00](3.6E+03)s[-01] (m/hr) Finally, in rare circumstances you can have units that look "the same" but are in fact very different. Density (g/mL) is very different than concentration (g/mL) - so an additional "context" word would be added after a special character (I used # as it is equivalent to an anchor on an html page). (1E-06)m[-03](1E-03)kg[+01]s[+00]K[+00]A[+00]mol[+00]cd[+00]#conc (1E-06)m[-03](1E-03)kg[+01]s[+00]K[+00]A[+00]mol[+00]cd[+00]#density OK, so maybe this would be useful but how to implement? Well, any validator used to work with OM or UnitsML would need to convert the representations to this common format. Rather than code this on a case by case basis it would be much better to have a web service that would take in both formats and send back the unique string format above. Of course it could also have a compare feature and return yes/no if they are the same. Thoughts? Stuart Chalk, Ph.D. Associate Professor of Chemistry Department of Chemistry and Physics University of North Florida 1 UNF Drive Jacksonville, FL 32224 USA P: 904-620-1938 F: 904-620-1989 E: schalk@unf.edu On Apr 17, 2009, at 4:07 PM, Martin S. Weber wrote: I've had a talk with Robert today about a point that was brought up by Prof James Davenport (and also transported by him to the OM-3 ML), which is the point of deciding about identity of units. To refresh your memories: what I had suggested to use for the openmath people is to have < OMFOREIGN > elements wrap up UnitsML unit vocabulary and put this into a OM Content Dictionary. To go one step further, they could even use XInclude or XPointer to pull in these definitions by URL from the UnitsDB but that is beyond the scope of the TC :-) What James then brought up was, to my interpretation, a scenario like the following: Something is using openmath or mathml and refers to an openmath content dictionary unit. Something else is using (one of the possible ways to) embed(ded) UnitsML to mark up units of measure related to some formulae or numerals. Now the question of identity arises, i.e., are the two using the same units? If e.g. both parties instead would simply be referencing the UnitsDB the answer would be simple, if the URLs and the GET request are the same, then, obviously, they are talking about the same unit. For other types of referencing we still could look at the unit's xml:id. But that only works so long when talking about the same dictionary of units. In the event of combination of OM and UnitsML units, the ids are likely to be different: one top-level xml:id for the < OMFOREIGN > definition, one for the unit on the other side. The wrapped up unitsml markup within the omforeign -could- carry the same xml:id as the one in UnitsDB, but from unitsml 1.0 to unitsdb being the canonical unitsml source of units there's still "some" way to go. It is thus likely that the unitsml content will be different. So how do we decide if two units are the same? We have a lot of optional information which can be left out, so we can only rely on that to a certain extent. In talking with robert, I think though we've realised a practical way to determine identity of units, by a inductive process: 1) Different representations of the same seven SI base units are identical. 2) Identity of derived units is determined by their contained root units. To realize 2) above, we simply follow all the < ExternalRootUnit > mentions, and recursively collect the < EnumeratedRootUnit> mentions(*) to build up a list of the base units. At some point there should be no more < ExternalRootUnit >s to collect, and then we can decide whether two derived units are the same. Now about 1): We don't have to care about that if people stick to using the < EnumeratedRootUnit >(*). But it is likely that at least for some period of time where there exists no canonical data source for unitsml markup (aka unitsdb) there will be concurring unitsml markups of the legal, definitive definitions of the seven SI base units. So to some extent we also have to worry about when to decide that two unitsml marked up representations of e.g. the "metre" (en-UK :) are identical. We have considered to require via the guidelines, that there are < UnitDefinition s available that ultimately refer to the BIPM normative legal definition of the metre (or meter how you people call it) etc., but for that we'd need the BIPM to have stable identifiers of different versions of the metre etc., which we still have to find out. Also it's not as easy in the light of updated fundamental physical constants, are we referencing the old metre per default? Or an updated one? Always the latest? etc. So some food for thought: How to decide if two unitsml marked up units are "identical"? What should we REQUIRE (**) in the guidelines to enable a UnitsML processor to decide about identity? IMO this is a question we have to solve before we can deliver "1.0" as it's very likely to be asked by implementors (heck your early adopter #1 -- me -- is stumbling over it. Help!:) -Martin (*) The guidelines should mention that base root units SHOULD(**) or even MUST be referred to via < EnumeratedRootUnit >, and < ExternalRootUnit > SHOULD be used if referring to another non-base unit ("or else"!) (**) in RFC2199 parlese --------------------------------------------------------------------- To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail. Follow this link to all your TCs in OASIS at: https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php |
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]