OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

unitsml message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [unitsml] Identity of units


Stuart et al.,

I've attached a short write-up that I've provided to Barry Taylor and Ambler Thompson to present to the CCU (Consultative Committee for Units of the BIPM) in May. This was done so that, at a minimum, the CCU would be aware of the OASIS UnitsML TC. Hopefully, someone would be interested in joining the TC.

The snippets of an example UnitsML instance document at the end of the write-up include various XML IDs for referencing data between the snippets. FYI, the attribute @xml:id is not allowed (by XML Spy) to contain the following characters: "^", "[" & "(", all of which I tried in creating the unique ID. Any comments on the write-up are welcome.

Bob

At 10:12 AM 4/21/2009, Stuart James Chalk wrote:
I agree that this kind of issue needs to be resolved before 1.0 is released. Here are my thoughts.

The work that was done on development of the InChI string by IUPAC has done a lot to show the importance
of having a unique identifier for something that can be represented in a number of ways.  Can we not do the same
thing for units?  I think so - consider the following as a representation of the Newton

m[+01]kg[+01]s[-02]K[+00]A[+00]mol[+00]cd[+00]

the format uses the symbols in a common order with square brackets around the power of that unit.  You could certainly do
a shorthand version of this by omitting the last four units but its not so bad having them present. Still, there is need for additional
information in order to be able to distinguish between similar units, by adding conversion factors in exponent form.

m[+01]kg[+00]s[-01] (m/s)
m[+01]kg[+00](6.0E+01)s[-01] (m/s)
m[+01]kg[+00](3.6E+03)s[-01] (m/hr)

Finally, in rare circumstances you can have units that look "the same" but are in fact very different.  Density (g/mL) is very different
than concentration (g/mL) - so an additional "context" word would be added after a special character (I used # as it is equivalent to an
anchor on an html page).

(1E-06)m[-03](1E-03)kg[+01]s[+00]K[+00]A[+00]mol[+00]cd[+00]#conc
(1E-06)m[-03](1E-03)kg[+01]s[+00]K[+00]A[+00]mol[+00]cd[+00]#density

OK, so maybe this would be useful but how to implement?  Well, any validator used to work with OM or UnitsML
would need to convert the representations to this common format.  Rather than code this on a case by case basis it
would be much better to have a web service that would take in both formats and send back the unique string format above.
Of course it could also have a compare feature and return yes/no if they are the same.

Thoughts?

Stuart Chalk, Ph.D.
Associate Professor of Chemistry
Department of Chemistry and Physics
University of North Florida
1 UNF Drive
Jacksonville, FL 32224 USA
P: 904-620-1938
F: 904-620-1989
E: schalk@unf.edu



On Apr 17, 2009, at 4:07 PM, Martin S. Weber wrote:

I've had a talk with Robert today about a point that was brought up by
Prof James Davenport (and also transported by him to the OM-3 ML), which
is the point of deciding about identity of units.

To refresh your memories: what I had suggested to use for the openmath
people is to have < OMFOREIGN > elements wrap up UnitsML unit vocabulary
and put this into a OM Content Dictionary. To go one step further, they
could even use XInclude or XPointer to pull in these definitions by URL
from the UnitsDB but that is beyond the scope of the TC :-)

What James then brought up was, to my interpretation, a scenario like
the following: Something is using openmath or mathml and refers to an
openmath content dictionary unit. Something else is using (one of the
possible ways to) embed(ded) UnitsML to mark up units of measure related
to some formulae or numerals. Now the question of identity arises, i.e.,
are the two using the same units?

If e.g. both parties instead would simply be referencing the UnitsDB
the answer would be simple, if the URLs and the GET request are the
same, then, obviously, they are talking about the same unit. For other
types of referencing we still could look at the unit's xml:id. But that
only works so long when talking about the same dictionary of units. In
the event of combination of OM and UnitsML units, the ids are likely to
be different: one top-level xml:id for the < OMFOREIGN > definition,
one for the unit on the other side. The wrapped up unitsml markup within
the omforeign -could- carry the same xml:id as the one in UnitsDB, but
from unitsml 1.0 to unitsdb being the canonical unitsml source of units
there's still "some" way to go. It is thus likely that the unitsml
content will be different.

So how do we decide if two units are the same? We have a lot of optional
information which can be left out, so we can only rely on that to a
certain extent. In talking with robert, I think though we've realised
a practical way to determine identity of units, by a inductive process:

1) Different representations of the same seven SI base units are identical.
2) Identity of derived units is determined by their contained root units.

To realize 2) above, we simply follow all the < ExternalRootUnit >
mentions, and recursively collect the < EnumeratedRootUnit>  mentions(*)
to build up a list of the base units. At some point there should be no
more < ExternalRootUnit >s to collect, and then we can decide whether
two derived units are the same.

Now about 1): We don't have to care about that if people stick to using
the < EnumeratedRootUnit >(*). But it is likely that at least for some
period of time where there exists no canonical data source for unitsml
markup (aka unitsdb) there will be concurring unitsml markups of the
legal, definitive definitions of the seven SI base units. So to some
extent we also have to worry about when to decide that two unitsml marked
up representations of e.g. the "metre" (en-UK :) are identical. We have
considered to require via the guidelines, that there are < UnitDefinition
s available that ultimately refer to the BIPM normative legal definition
of the metre (or meter how you people call it) etc., but for that we'd
need the BIPM to have stable identifiers of different versions of the
metre etc., which we still have to find out. Also it's not as easy in
the light of updated fundamental physical constants, are we referencing
the old metre per default? Or an updated one? Always the latest? etc.

So some food for thought: How to decide if two unitsml marked up units
are "identical"? What should we REQUIRE (**) in the guidelines to enable
a UnitsML processor to decide about identity? IMO this is a question we
have to solve before we can deliver "1.0" as it's very likely to be
asked by implementors (heck your early adopter #1 -- me -- is stumbling
over it. Help!:)

-Martin

(*) The guidelines should mention that base root units SHOULD(**) or
even MUST be referred to via < EnumeratedRootUnit >, and
< ExternalRootUnit > SHOULD be used if referring to another non-base
unit ("or else"!)
(**) in RFC2199 parlese

---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that
generates this mail.  Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php

Development_of_UnitsML.doc



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]