[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [office] ISO 14977 EBNF grammar
David, Sigh, I don't know how I get involved in this sort of issue. ;-) OK, the author of ISO 14977 stopped publishing several years ago and I wasn't about to quickly find an email contact for him. R. S. Scowen if you are curious about that sort of thing and he wrote a paper on EBNF that may be helpful: Extended BNF - A Generic Base Standard http://www.cl.cam.ac.uk/~mgk25/iso-14977-paper.pdf I have written to one of his younger colleagues who has a webpage talking about the EBNF grammar, http://www.cl.cam.ac.uk/~mgk25/iso-ebnf.html, but he is on paternity leave so it may be a bit before we hear from him, assuming my post gets past his spam filter. I tried not to say anything about $millions of dollars, various African countries, etc., in the first few lines of my post anyway. ;-) After reading ISO 14977 again (actually more than once), it seems to me that we are straining without just cause. True, I think it requires us to define what we mean by Unicode character, but having done so (I suggest we simply copy the Unicode definition), all we need do is supply a defined start and end for a sequence. I think that works, at least for the range issue. As far as negation, isn't that the same as excluding a range? I realize there may be deeper issues about negation but as far as parsing, doesn't exclusion fit the bill? Just some quick thoughts. I will try to return to the issue later this week and/or get guidance from real experts on it. Hope you are having a great day! Patrick David A. Wheeler wrote: > Patrick Durusau: > >> So, I take it that the technical issue (as oppose to aesthetics, etc.) >> is the lack of support for character and negated ranges? >> > > Correct. Quick clarification: It's negated _character_ ranges that > ISO doesn't support. Other kinds of negation work, I believe. > > >> When you say "lack of support" I assume you mean that character and >> negative ranges are not predefined? Yes? >> > > No. It doesn't have a range operator at all; all it allows is listing alternatives. > > >> Which is than saying ISO/IEC 14977 cannot define character and negated >> ranges. Yes? >> > > It's not that it CAN'T do it, the problem is that there is no built-in > range operator, and thus you must enumerate every instance. > That becomes insane when you have to support international use via > Unicode/ISO 10646 characters; there is NO way we'll enumerate them all. > > I think an example will clarify why the ISO spec doesn't work well for > defining certain kinds of data formats with international characters. > > Here's how you can define "digits 1 through 9" in W3C's notation: > digits1to9 ::= [1-9] > > Here's how you have to do it in ISO - by explicit enumeration of > each possibility, one by one: > digits1to9 = "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" > > Notice how painful it is when there are only 9 characters. > Because we want to support internationalized characters (such as > sheet names in ANY languages), enumerating all international characters except > a few is, um, absurd. If you go beyond the BMP (0), you're talking hundreds of > thousands of characters in the enumeration. > > W3C's notation, in contrast, can say things like [^$] to say "any character but a $". > Nice and clean. > > >> Err, then you say it lacks "regular expression" support? But as above, >> that is simply a question of defining the support that we want/need. Yes? >> > > Um, sorry, what I mean by "regular expression" is specifically the usual > "character range operator" built into all regex languages that I know of. > There is no range operator mechanism, as far as I can tell, in ISO's format. > If I missed it, please let me know. > > We could extend ISO's BNF in a nonstandard way, but then what's the > point of using the standard? Better to use a standard that has the > needed capabilities built in. > > ISO's BNF format isn't horrific, and it's quite suitable for a lot of constrained > languages. Many formats FORBID arbitrary international characters, and for > them it's probably okay. But for the formula spec, where international > characters ARE allowed, that was a problem. Besides, it's a little ugly :-). > > >> I note that the use of the XML BNF starts with Chapter 5 of the formula >> work. I would think it would be better to use a BNF to define the >> primitives up to that point so as to avoid ambiguity running up to >> Chapter 5. >> > > That's not a bad idea. I don't know how long that will take to do. > > > Robert Weir: > >> Although we are not a W3C standard, ODF certain has a "family >> resemblance" to them, based on our use of so many other W3C standards, >> such as XML, XLink, MathML, XForms, etc. So defining our syntax using >> their conventions is a reasonable thing. >> On other hand, in other cases we have not used W3C standards and instead >> used standards from ISO. For example, our use of RELAX NG rather than >> XML Schema. >> > > >> Either choice is defensible, I believe, and can lead to a clear, >> unambiguous syntactic definition. >> > > Agree. In the OpenFormula case I still think the W3C format is the best > choice, and the ISO format is suboptimal. We could switch to the ISO BNF for > OpenFormula if it was desperately necessary. We could work around ISO's > lack of a range operator by removing the formal specification of characters in the spec > and using informal text instead. The spec would be less clear because of all > the unnecessary punctuation required by ISO's format, and what's worse, we > would change something formally specified into something only specified by prose. > I don't like that trade at all; I prefer that specifications be specified using formal > (machine-processable) languages as much as possible unless it just can't > be made clear that way. There's less chance of mis-interpretation > when it's spec'ed in a formal language. > > --- David A. Wheeler > > --------------------------------------------------------------------- > To unsubscribe from this mail list, you must leave the OASIS TC that > generates this mail. You may a link to this group and all your TCs in OASIS > at: > https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php > > > -- Patrick Durusau patrick@durusau.net Chair, V1 - US TAG to JTC 1/SC 34 Convener, JTC 1/SC 34/WG 3 (Topic Maps) Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300 Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps)
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]