Risk Analysis of XML Schema Features

UBL Schema subcommittee (Eduardo Gutentag, chair)
12 August 2001

Features and Risks

The Risk column can be none, low, high, unacceptable, or uncertain. We didn't assess the importance of each feature, just the risk of using it. Levels of risk could be due to several factors: interoperability issues, extensibility issues, tool deployment, etc.

Feature	Risk	Comments
Target namespaces	high	huge interoperability and comprehensibility problems; hard to mitigate risks
Wildcards	high	useful for publishing flexibility in catalog applications, but we might be concerned about the ability of foreign-namespace material to be a Trojan horse and, e.g., disable a base semantic; we may want to use it advisedly and ensure that only specific namespaces get in
Globally defined elements	none	Necessary and appropriate
Locally defined elements	high
Occurrence (n,m)	none	it's essential for business documents
Mixed content	high	can be confusing to application designers, and so we should guide them not to use it except in cases where "free text" is needed (typically publishing applications) and that, in those cases, they are aware of considerations such as whitespace
Attributes	none
Global attributes	low	they seem okay, but people need to be aware of the prefixing requirements
Defaulted and fixed attribute values	uncertain	different processing scenarios (e.g., multipurpose large validation suite vs. small single-purpose tool) seem to favor different choices on this; relying on documentation for essential business info is a concern, but so is the fact that documents parsed in the absence of their schema are interpreted differently than when parsed in the schema's presence; note that RELAX NG doesn't have this feature but that XSLT could replace it
Simple types	low	we need to keep our eye on the few ambiguities, and define a profile (e.g., either always use UTC or always define a time zone) and/or define types that replace some of the built-in types (e.g. dates and times), though the latter adds to the risk because there won't be widespread implementations
Anonymous complex types	low	use only when not intended for reuse
Named complex types	low	use with caution
Complex type abstractness	low	critical for xsi:type, but we're concerned about usage parameters
Complex type extension	low
Complex type restriction	high
Substitution groups
Attribute groups	low	they're just a macro feature, and thus are to be avoided when reuse of types is desired
Model groups	low	same as attribute groups
Keys in general	high	the simple type "ID" is risky because it must be an XML NAME, and references to keys might as well be URI references because the references often come from outside
XPointer (used in key references done as URI refs)	high	not well supported; we may have to define a profile
Scoped keys	high	ditto
Multipart keys	high	ditto; in addition, it's not transformable into other schema languages
Uniqueness constraint	uncertain	it's highly desirable for business documents, but we're uncertain about its deployment in tools
Notations	unacceptable
Annotations	low	we need to define a profile for how to to use this, so that arbitrary application info isn't added
Application info	unacceptable	this is designed to add a layer of semantics that could mess up our intended semantics
Processing instructions in schemas	high	ditto
Processing instructions in documents	uncertain	has the potential for Trojan horses (especially if programming code is included), but do we need to provide some kind of escape hatch to account for real life? and anyway, we can't control (through XML parsers) whether people use them; we could say that processors that handle UBL documents may/must ignore PIs
xml:lang	uncertain	Its valid values are not enumeratable; if we use this rather than create our own attribute, we would probably want to restrict its values somehow; however, this is a schema design issue and not a risk assessment issue
xml:space	uncertain

ubl-ndrsc message

Risk Analysis of XML Schema Features

Features and Risks