OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

office message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [office] ISO 14977 EBNF grammar


David,

Sigh, I don't know how I get involved in this sort of issue. ;-)

OK, the author of ISO 14977 stopped publishing several years ago and I 
wasn't about to quickly find an email contact for him.

R. S. Scowen if you are curious about that sort of thing and he wrote a 
paper on EBNF that may be helpful:

Extended BNF - A Generic Base Standard
http://www.cl.cam.ac.uk/~mgk25/iso-14977-paper.pdf

I have written to one of his younger colleagues who has a webpage 
talking about the EBNF grammar, 
http://www.cl.cam.ac.uk/~mgk25/iso-ebnf.html, but he is on paternity 
leave so it may be a bit before we hear from him, assuming my post gets 
past his spam filter. I tried not to say anything about $millions of 
dollars, various African countries, etc., in the first few lines of my 
post anyway. ;-)

After reading ISO 14977 again (actually more than once), it seems to me 
that we are straining without just cause. True, I think it requires us 
to define what we mean by Unicode character, but having done so (I 
suggest we simply copy the Unicode definition), all we need do is supply 
a defined start and end for a sequence. I think that works, at least for 
the range issue.

As far as negation, isn't that the same as excluding a range? I realize 
there may be deeper issues about negation but as far as parsing, doesn't 
exclusion fit the bill?

Just some quick thoughts. I will try to return to the issue later this 
week and/or get guidance from real experts on it.

Hope you are having a great day!

Patrick

David A. Wheeler wrote:
> Patrick Durusau:
>   
>> So, I take it that the technical issue (as oppose to aesthetics, etc.)
>> is the lack of support for character and negated ranges?
>>     
>
> Correct.  Quick clarification: It's negated _character_ ranges that
> ISO doesn't support.  Other kinds of negation work, I believe.
>
>   
>> When you say "lack of support" I assume you mean that character and
>> negative ranges are not predefined? Yes?
>>     
>
> No.  It doesn't have a range operator at all; all it allows is listing alternatives.
>
>   
>> Which is than saying ISO/IEC 14977 cannot define character and negated
>> ranges. Yes?
>>     
>
> It's not that it CAN'T do it, the problem is that there is no built-in
> range operator, and thus you must enumerate every instance.
> That becomes insane when you have to support international use via
> Unicode/ISO 10646 characters; there is NO way we'll enumerate them all.
>
> I think an example will clarify why the ISO spec doesn't work well for
> defining certain kinds of data formats with international characters.
>
> Here's how you can define "digits 1 through 9" in W3C's notation:
>  digits1to9 ::= [1-9]
>
> Here's how you have to do it in ISO - by explicit enumeration of
> each possibility, one by one:
>  digits1to9 = "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
>
> Notice how painful it is when there are only 9 characters.
> Because we want to support internationalized characters (such as
> sheet names in ANY languages), enumerating all international characters except
> a few is, um, absurd.  If you go beyond the BMP (0), you're talking hundreds of
> thousands of characters in the enumeration.
>
> W3C's notation, in contrast, can say things like [^$] to say "any character but a $".
> Nice and clean.
>
>   
>> Err, then you say it lacks "regular expression" support? But as above,
>> that is simply a question of defining the support that we want/need. Yes?
>>     
>
> Um, sorry, what I mean by "regular expression" is specifically the usual
> "character range operator" built into all regex languages that I know of.
> There is no range operator mechanism, as far as I can tell, in ISO's format.
> If I missed it, please let me know.
>
> We could extend ISO's BNF in a nonstandard way, but then what's the
> point of using the standard? Better to use a standard that has the
> needed capabilities built in.
>
> ISO's BNF format isn't horrific, and it's quite suitable for a lot of constrained
> languages.  Many formats FORBID arbitrary international characters, and for
> them it's probably okay.  But for the formula spec, where international
> characters ARE allowed, that was a problem.  Besides, it's a little ugly :-).
>
>   
>> I note that the use of the XML BNF starts with Chapter 5 of the formula
>> work. I would think it would be better to use a BNF to define the
>> primitives up to that point so as to avoid ambiguity running up to
>> Chapter 5.
>>     
>
> That's not a bad idea.  I don't know how long that will take to do.
>
>
> Robert Weir:
>   
>> Although we are not a W3C standard, ODF certain has a "family
>> resemblance" to them, based on our use of so many other W3C standards,
>> such as XML, XLink, MathML, XForms, etc.  So defining our syntax using
>> their conventions is a reasonable thing.
>> On other hand, in other cases we have not used W3C standards and instead
>> used standards from ISO.  For example, our use of RELAX NG rather than
>> XML Schema.
>>     
>
>   
>> Either choice is defensible, I believe, and can lead to a clear,
>> unambiguous syntactic definition.
>>     
>
> Agree.  In the OpenFormula case I still think the W3C format is the best
> choice, and the ISO format is suboptimal.  We could switch to the ISO BNF for
> OpenFormula if it was desperately necessary. We could work around ISO's
> lack of a range operator by removing the formal specification of characters in the spec
> and using informal text instead.  The spec would be less clear because of all
> the unnecessary punctuation required by ISO's format, and what's worse, we
> would change something formally specified into something only specified by prose.
> I don't like that trade at all; I prefer that specifications be specified using formal
> (machine-processable) languages as much as possible unless it just can't
> be made clear that way.  There's less chance of mis-interpretation
> when it's spec'ed in a formal language.
>
> --- David A. Wheeler
>
> ---------------------------------------------------------------------
> To unsubscribe from this mail list, you must leave the OASIS TC that
> generates this mail.  You may a link to this group and all your TCs in OASIS
> at:
> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php 
>
>
>   

-- 
Patrick Durusau
patrick@durusau.net
Chair, V1 - US TAG to JTC 1/SC 34
Convener, JTC 1/SC 34/WG 3 (Topic Maps)
Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300
Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps)



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]