OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

relax-ng message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Subject: Re: Explosion


James Clark wrote:

> I know you want to restrict <oneOrMore> patterns that contain attributes and
> there may be good arguments for this, but the argument you are advancing
> here is not a good argument.

No, I have already presented my argument in a different thread.  I wrote:

> In general, when an <oneOrMore> element contains both <attribute> and
> <ref>, it does not have a GNF.

Without a GNF, I do not think the equivalence and subset relationship 
of patterns can be tested.  I also think that descripitiptive power 
beyond the class of regular languages is excessive.

This thread is merely an observation about your implementation.
I think that some of your arguments are objectionable.

> 1. Memoization is fundamental to JTREX.  JTREX does extensive memoization
> already for elements and character data.  It would have terrible performance
> if it didn't.  The reason that JTREX's performance with attributes is worse
> than its performance with elements is that it does memoization for elements
> (in most cases) but doesn't do any for attributes.

Other than TREX, I have seen two implementations of binarization.  Both 
do memoization.  I agree that memoization is fundamental.

However, suppose that JTREX does not do memoization.  Then, when we use 
elements only, do we have very slow linear behaviour or exponential behaviour?
Elements are easier since the order of elements is significant.   Most 
branches will soon become deadends.  But attributes are more difficult 
since the order is not significant.  Branches do not become deadends.

> 2. The exponential behaviour you have observed with attributes arises even
> without oneOrMore.  You can get it with a large group just as easily.

Initially, such a large group will exhibit the exponential behavior, though it 
will eventually behave like a constant function.
 
> 3.  In the cases you suggested there was in fact no explosion in the number
> of states (distinct patterns); the increase in the number of states was in
> fact linear in the number of attributes. The cases that are really
> problematic are those that do cause an explosion in the number of states and
> hence cannot be solved by memoization.

Here is such a case.  6 attributes were good enough for this case.

1) 4 attributes

C:\TREX>trex explode1.trex explode.xml
Elapsed time 830 milliseconds

2) 5 attributes

C:\TREX>trex explode1.trex explode.xml
Error at URL "file:/C:/TREX/explode.xml", l
ed attributes missing
Elapsed time 7420 milliseconds

3) 6 attributes

C:\TREX>trex explode1.trex explode.xml
Elapsed time 102050 milliseconds

<element name="foo">
  <oneOrMore>
  <oneOrMore>
  <oneOrMore>
  <oneOrMore>
  <oneOrMore>
  <oneOrMore>
  <oneOrMore>
  <oneOrMore>
    <attribute>
      <anyName/>
    </attribute>
    <optional><element name="foo"><empty/></element></optional>
    <attribute>
      <anyName/>
    </attribute>
  </oneOrMore>
    <optional><element name="foo"><empty/></element></optional>
  </oneOrMore>
    <optional><element name="foo"><empty/></element></optional>
  </oneOrMore>
    <optional><element name="foo"><empty/></element></optional>
  </oneOrMore>
    <optional><element name="foo"><empty/></element></optional>
  </oneOrMore>
    <optional><element name="foo"><empty/></element></optional>
  </oneOrMore>
    <optional><element name="foo"><empty/></element></optional>
  </oneOrMore>
    <optional><element name="foo"><empty/></element></optional>
  </oneOrMore>
</element>

Cheers,

Makoto


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Powered by eList eXpress LLC