OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

office message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [office] ISO 14977 EBNF grammar


Patrick Durusau:
> So, I take it that the technical issue (as oppose to aesthetics, etc.)
> is the lack of support for character and negated ranges?

Correct.  Quick clarification: It's negated _character_ ranges that
ISO doesn't support.  Other kinds of negation work, I believe.

> When you say "lack of support" I assume you mean that character and
> negative ranges are not predefined? Yes?

No.  It doesn't have a range operator at all; all it allows is listing alternatives.

>Which is than saying ISO/IEC 14977 cannot define character and negated
>ranges. Yes?

It's not that it CAN'T do it, the problem is that there is no built-in
range operator, and thus you must enumerate every instance.
That becomes insane when you have to support international use via
Unicode/ISO 10646 characters; there is NO way we'll enumerate them all.

I think an example will clarify why the ISO spec doesn't work well for
defining certain kinds of data formats with international characters.

Here's how you can define "digits 1 through 9" in W3C's notation:
 digits1to9 ::= [1-9]

Here's how you have to do it in ISO - by explicit enumeration of
each possibility, one by one:
 digits1to9 = "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"

Notice how painful it is when there are only 9 characters.
Because we want to support internationalized characters (such as
sheet names in ANY languages), enumerating all international characters except
a few is, um, absurd.  If you go beyond the BMP (0), you're talking hundreds of
thousands of characters in the enumeration.

W3C's notation, in contrast, can say things like [^$] to say "any character but a $".
Nice and clean.

> Err, then you say it lacks "regular expression" support? But as above,
> that is simply a question of defining the support that we want/need. Yes?

Um, sorry, what I mean by "regular expression" is specifically the usual
"character range operator" built into all regex languages that I know of.
There is no range operator mechanism, as far as I can tell, in ISO's format.
If I missed it, please let me know.

We could extend ISO's BNF in a nonstandard way, but then what's the
point of using the standard? Better to use a standard that has the
needed capabilities built in.

ISO's BNF format isn't horrific, and it's quite suitable for a lot of constrained
languages.  Many formats FORBID arbitrary international characters, and for
them it's probably okay.  But for the formula spec, where international
characters ARE allowed, that was a problem.  Besides, it's a little ugly :-).

>I note that the use of the XML BNF starts with Chapter 5 of the formula
>work. I would think it would be better to use a BNF to define the
>primitives up to that point so as to avoid ambiguity running up to
>Chapter 5.

That's not a bad idea.  I don't know how long that will take to do.


Robert Weir:
>Although we are not a W3C standard, ODF certain has a "family
>resemblance" to them, based on our use of so many other W3C standards,
>such as XML, XLink, MathML, XForms, etc.  So defining our syntax using
>their conventions is a reasonable thing.
>On other hand, in other cases we have not used W3C standards and instead
>used standards from ISO.  For example, our use of RELAX NG rather than
>XML Schema.

>Either choice is defensible, I believe, and can lead to a clear,
>unambiguous syntactic definition.

Agree.  In the OpenFormula case I still think the W3C format is the best
choice, and the ISO format is suboptimal.  We could switch to the ISO BNF for
OpenFormula if it was desperately necessary. We could work around ISO's
lack of a range operator by removing the formal specification of characters in the spec
and using informal text instead.  The spec would be less clear because of all
the unnecessary punctuation required by ISO's format, and what's worse, we
would change something formally specified into something only specified by prose.
I don't like that trade at all; I prefer that specifications be specified using formal
(machine-processable) languages as much as possible unless it just can't
be made clear that way.  There's less chance of mis-interpretation
when it's spec'ed in a formal language.

--- David A. Wheeler


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]