[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [office] ISO 14977 EBNF grammar
Patrick Durusau: > So, I take it that the technical issue (as oppose to aesthetics, etc.) > is the lack of support for character and negated ranges? Correct. Quick clarification: It's negated _character_ ranges that ISO doesn't support. Other kinds of negation work, I believe. > When you say "lack of support" I assume you mean that character and > negative ranges are not predefined? Yes? No. It doesn't have a range operator at all; all it allows is listing alternatives. >Which is than saying ISO/IEC 14977 cannot define character and negated >ranges. Yes? It's not that it CAN'T do it, the problem is that there is no built-in range operator, and thus you must enumerate every instance. That becomes insane when you have to support international use via Unicode/ISO 10646 characters; there is NO way we'll enumerate them all. I think an example will clarify why the ISO spec doesn't work well for defining certain kinds of data formats with international characters. Here's how you can define "digits 1 through 9" in W3C's notation: digits1to9 ::= [1-9] Here's how you have to do it in ISO - by explicit enumeration of each possibility, one by one: digits1to9 = "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" Notice how painful it is when there are only 9 characters. Because we want to support internationalized characters (such as sheet names in ANY languages), enumerating all international characters except a few is, um, absurd. If you go beyond the BMP (0), you're talking hundreds of thousands of characters in the enumeration. W3C's notation, in contrast, can say things like [^$] to say "any character but a $". Nice and clean. > Err, then you say it lacks "regular expression" support? But as above, > that is simply a question of defining the support that we want/need. Yes? Um, sorry, what I mean by "regular expression" is specifically the usual "character range operator" built into all regex languages that I know of. There is no range operator mechanism, as far as I can tell, in ISO's format. If I missed it, please let me know. We could extend ISO's BNF in a nonstandard way, but then what's the point of using the standard? Better to use a standard that has the needed capabilities built in. ISO's BNF format isn't horrific, and it's quite suitable for a lot of constrained languages. Many formats FORBID arbitrary international characters, and for them it's probably okay. But for the formula spec, where international characters ARE allowed, that was a problem. Besides, it's a little ugly :-). >I note that the use of the XML BNF starts with Chapter 5 of the formula >work. I would think it would be better to use a BNF to define the >primitives up to that point so as to avoid ambiguity running up to >Chapter 5. That's not a bad idea. I don't know how long that will take to do. Robert Weir: >Although we are not a W3C standard, ODF certain has a "family >resemblance" to them, based on our use of so many other W3C standards, >such as XML, XLink, MathML, XForms, etc. So defining our syntax using >their conventions is a reasonable thing. >On other hand, in other cases we have not used W3C standards and instead >used standards from ISO. For example, our use of RELAX NG rather than >XML Schema. >Either choice is defensible, I believe, and can lead to a clear, >unambiguous syntactic definition. Agree. In the OpenFormula case I still think the W3C format is the best choice, and the ISO format is suboptimal. We could switch to the ISO BNF for OpenFormula if it was desperately necessary. We could work around ISO's lack of a range operator by removing the formal specification of characters in the spec and using informal text instead. The spec would be less clear because of all the unnecessary punctuation required by ISO's format, and what's worse, we would change something formally specified into something only specified by prose. I don't like that trade at all; I prefer that specifications be specified using formal (machine-processable) languages as much as possible unless it just can't be made clear that way. There's less chance of mis-interpretation when it's spec'ed in a formal language. --- David A. Wheeler
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]