Version 3.0 Comments

I apologize in advance for not knowing how to use “change tracker”. At any rate (due to that failing on my part), I will just identify (by line number) and/or represent in these comments portions of the “OASIS Extensible Name and Address Language (xNAL) Version 3.0, Working Committee Draft 01, 04 January 2005” to which I may refer. Also, please forgive me for taking so long to make these comments (my systems was down / being reconstructed, after which I had a lot of “catch up” to do in other areas). Hopefully despite that delay, they can still be considered.

BRIEFLY –

I commend several aspects of the proposed standards (see “General Comments” under “DETAILS”).
I suggest that some consideration of possible improvements might be given (the possible need for and use of those being discussed under “Specific Comments” under “DETAILS”) –

A “raw” punctuation element;
Alternate ordering of basic or advanced elements in the schema (possibly already allowed);
The possible need for “address line” and/or “address line position” attributes for address elements (giving the “place” in which an address element does or should appear in a line and “address block” … again, if that is not already allowed in some way);
The control of assignment of type name and type values (“State” for “AdministrativeArea type”, for instance) is unclear (to me at least) and, at any rate, sometimes problematic – so it may warrant some discussion;
One minor notational clarification point.

DETAILS -

General Comments :

The relationship between xNAL and various applications is well laid out and supported (lines 436-462, page 19 for example).
The examples of address variability and the need for the OASIS standard to support that in both “basic” and “advanced” ways are excellent.
The tack of reducing complexity and variability by adopting a “flat” / relational structure in version 3.0 (verses the attempted hierarchical structure of version 2.0) is, in my option, an appropriate one – well suited to the “real world” complexity of name and address objects, which makes hierarchical encoding highly problematic (and change-able!).

Specific Comments :

The strategy of encoding punctuation (or symbols) as attributes (see lines 573-592, page 26) may be workable in many situations, but IS dependent on “raw” parsing and/or reference data aided parsing (and matching) capabilities, “quirks”, and goals (postal standard or some other format standard, for instance). This is probably recognized in the statements from lines 1004-1011, page 40 (bolded and italicized emphasis mine) –

Though xNAL is not specifically designed to format name and address data, it ensures that it captures all the original data that will enable users to format the names and addresses. In xNAL, attributes play a major role in helping users to format names and addresses as they define the type of data that has been defined. For example, a name field could end with a period or could use a comma as a separator field. Some systems may store this and others may not. This is where users should have a clearly defined set of business rules to capture name and address data into xNAL and this is specific to individual requirements.

I am ASSUMING here that the ability to format name and address data INCLUDES the ability to reconstruct an XML encoded address (however parsed) back into the format it originally appeared (was submitted or extracted from). It is the ability of the proposed standard to ensure the capture of all original data (for formatting or reconstruction) and how that data will be stored with which there may be some confusion (on my part) and/or “hole” in the proposed standard –

Lines 548-592, page 26, make excellent and valid points demonstrating the variable use and meaning of punctuation. However, being able to appropriately encode that use and “meaning” presupposes certain capabilities or knowledge (often country specific!) of a parser. Specifically with regard to the given example “12-14”, all of the following are possible parser outputs (possibly with country specific “knowledge”, but even more so without country specific “knowledge” … both, however, ALSO dependent on other contents of a string containing that AND whether or not reference data and matching have “aided” the parsing!) -

“12-14” being taken as just a street number / identifier (whether it really should mean “12 to 14” or not).
“12-14” being taken as a street number / identifier AND a unit number / identifier in EITHER ORDER.
“12” being taken as a street number / identifier, “-“ being considered the same as a space, and “14” POSSIBLY being taken (again, depending on other preceding or following string contents!) as a street NAME (a deficient representation of “14^th”).
“-“ being ignored totally (dropped) and the number / identifier being taken as “1214”.
Given sufficient reference data and matching capabilities, “12”, “14”, and (multi-match) / or (resolution of one number to a reference DB “delivery point”) “1214” being identified as the correct post-match parsed number / identifier. Note that this could occur even when “raw” parsing would ASSUME either number as a possible unit number, because reference data could indicate that no unit was appropriate for the matched street number (might drop the other number rather than post-match parsing it into a “possible but unmatched” unit number / id field).

Where a parser (“home grown” or vendor provided) does not have those capabilities or knowledge, the XML standard should still probably be able to accommodate punctuation encoding. That is

The parser may not be able to identify / indicate punctuation as an attribute of some specific address element.
There may be some allowed or irresolvable vagueness as to which address element the punctuation should “belong” as an attributed. For instance, in “12-14” where “12” and “14” are considered individual address elements (whether that be for street number / id “12” and unit number / id “14” OR beginning and ending street numbers / ids!) - to which of those elements should the dash / “-“ be “attached” as an attribute?

While definitely not as complex as the overall hierarchical encoding seen in version 2.0, being avoided by version 3.0, this conundrum is somewhat similar, and some “flat” avoidance mechanism might also be appropriate. Specifically, one should probably just have a “raw” punctuation element (rather than just an attribute) to do so.
The need for a “raw” punctuation element can, perhaps, be further seen in the example given and encoded on lines 790-862, pages 33-34.

Note that SOME of the punctuation gets preserved as an attribute (the “Extension separator”, for example) but other punctuation (the commas BETWEEN address elements) does NOT.
This lack of preservation would mean that the original example could not be reconstructed from the encoding.
One might ASSUME that some OUTSIDE “standard” (postal or otherwise) and formatting facility could / would be involved in reconstructing the example so that the dropped punctuation would “reappear”.

I do NOT believe this is a “good” assumption, nor should it be required / counted on for use of the proposed standard.
At any rate, the example given might be the preferred (and already matched / corrected) version of an “address block”. In which case, the punctuation should definitely be preserved (and reconstruct-able) RATHER than requiring additional processing (re-matching or complex address formatting) to do so!

NOTE : One might then say that a user of the standard should just employ the “basic” structure for encoding the example. However, even that (in the example) “drops” the comma between city and state. Of course, one might just use the basic structure to encode only “address lines” (without separating out the city and state into elements). HOWEVER, a user may wish to store or transmit the address with SOME separation of elements (the city and state in the “basic” example) OR in the advanced form (for any one of a number of valid reasons); BUT that user may STILL want to be able to reconstruct the “standard” address block from it using simpler and more efficient processing methods than re-matching or full / complex address formatting.

The previously mentioned document examples show a hierarchical presentation of address elements - at least from top to bottom / based on the importance / “level” of the elements. Though the standard avoids the complications of attempting hierarchical nesting, the shown top to bottom order of element presentation should probably not be a requirement (implied by example or otherwise) of the standard.

Indeed, that ordering may not represent any requirement.

That is, the elements COULD be ordered from top to bottom in any way a user chooses – so that the elements would appear in some “standard” or conventional order with regard to address lines or an “address block”.
If this is true, then that should either be stated or examples provided to make that clear.

However, if the shown order is part of the standard (or even if it is “preferred”), then that would in turn require some re-formatting or reconstruction mechanism outside of the standard to place those elements appropriately into address lines / an “address block”.

I would suggest that the standard should NOT place this additional requirement on its users.
If users have already gone to the work / processing to put the address elements in an order appropriate for the way they need them presented in an address block, then the standard should allow them to preserve that.

With regard to being able to construct (or reconstruct) an address (address line OR address block) from encoded address elements, one might need a “line in address block” and/or “position in line” attributes for an element.

The latter (position in line) might NOT be needed assuming the standard allows users to order (from top to bottom) the address elements (see previous point). On the other hand, if they wish to do “top to bottom” hierarchy presentation, they may ALSO then wish to have such an attribute to show the alternate address line position of an element as well.
However, the former (line in address block) could be needed where for whatever reasons (some standard, some restriction in the number of address lines, etc.) the line on which certain elements should appear needs to be “known” / encoded for address line or address block construction / reconstruction.

On line 836 of page 34, “<AdminstrativeArea type=”State”>” appears. It may be that “AdminstrativeArea” as a type name is defined / controlled elsewhere (UML, wherever – please excuse my ignorance and/or lack of successful “digging”). Perhaps the type value “State” is also defined / controlled somewhere. Whether or not both of those are, some discussion of the way in which users should implement (or find) those might be appropriate. Specifically –

Whether or not “State” subsumes “Province”?
Whether BOTH “State” and “Province” might be valid type values?
Whether (somehow?) one should or would know that “State” and “Province” (at least between the USA and Canada, respectively) are roughly equivalent / “at the same level”.
Whether (somehow?) the type value “County”, for instance, should be known (and encoded!) to be at a “level” “below” the type value “State” (for the USA, at least)?

One might consider this a NON-STRUCTURAL issue. But it will quickly become an IMPLEMENTATION AND INTEROPERABILTY issue for users of the schema structure. For that reason, it might bear some thought / discussion in the standard (or the standard should point one elsewhere where that occurs / is covered). The above example is by no means unusual and it is one of the simpler ones surrounding issues / problems with appropriate and usable “typing” (type names AND values).

At lines 1215-1223 on page 51, the indications for cardinality are provided. However, in the diagrams on the following pages an asterisk (“*”) appears to take the place of “many” (as in “0..*” meaning “0..many”). Maybe this equivalence is noted somewhere, but I missed it? If not, it might need to be.

Thank you,

David Putman

ciq message