OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

wsn message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [wsn] Analysis of filtering


Part of the problem is that there are two ways of viewing "state" here.  I believe your definition of "stateless" is "depending only on the content of single messages."  Under that definition, dmhFilter is definitely stateful.  I was thinking of a filter as a function on a stream (or sequence, if you prefer) of messages, and in that sense it's just a function.  BTW, a more plausible dmhFilter would be one that eliminates duplicates (a la UNIX "uniq"), or adjacent messages that are "too similar".

If a filter is simply a boolean function as you describe, and multiple filters in a <filter> clause are anded together, then their order doesn't matter.  Strictly speaking, this is because boolean "and" is commutative, not because the filters commute.

My question is, is there a compelling reason to call out boolean filtering specially?  I haven't made up my mind on this yet.  On the one hand, it's a well-circumscribed domain with easily-defined semantics.  On the other hand, it's evidently not considered adequate, and from a user perspective, it's hard to see why this sort of filtering should have any special status.

Put another way, there are three possible notions of "filtering" running around:
  • Selecting messages, in or out, in isolation (topic, selector, precondition).
  • Transforming message content (not-a-filter)
  • Transforming message streams (uniq, every other, ...?)
Simple filtering is a special case of the other two, and the second option is a special case of the third.  So far, we have definite use cases for the first two.  The problem is, even the second is enough to introduce sticky problems of evaluation order.  Even if we quarantine content transformation to a separate part of the Subscribe message, the question still arises: when is it applied?  If we mandate that it applies only before or only after filtering, on what basis do we mandate that?

Peter Niblett wrote:



Maybe I am missing something, but if we say that filters are functions
returing a boolean and which are allowed to take as input only

i)  The message itself
ii) The topic (if there is one)
iii) Other NP state associated with the situation - this state being
constant for all the filters in the list

and the these filters must have no side-effects, then commutativity is
assured. The problem with dmhFilter occurs if its implementation a) updates
some state - the number of messages that it has processed -  and b) uses
that state as part of its decision. In other words for a given message m,
dmhFilter(m) sometimes returns true and sometimes false.

If instead you say that the NP itself updates its count of messages
produced, and dmhFilter operates on that count (thus obeying my rule iii) ,
then I think it is commutative. Consider this

selector, message contains foo
dmhFilter, every other message *** This means every EVEN message produced
by the NP ****
and the stream of notifications is
foo 1; bar 2; foo 3; bar 4; foo 5; bar 6; foo 7; bar 8.

Then you get nothing at all whichever order the NP chooses (selector
eliminates the bars, dmhFilter eliminates the odd numbered ones - in this
example the foos).

We thus get the determinicity required for interoperation, and
NotificationProducers are free to execute the filters in whatever order
they choose - allowing them to optimise performance by adjusting the order
if they wish - without the need to add more complex composition into core
WSN.

Do we really need to allow people to introduce filters that access state
which can be modified by other filters in the list?

People may want to introduce special filters, such as Sanjay's Sarbanes
Oxley, that are themselves composed of non-commutative rules (subfilters).
The inventor of such a filter is of course free to define a nesting syntax
or ordering constraint for the content of these filters.



Peter Niblett



                                                                           
             David Hull                                                    
             <dmh@tibco.com>                                               
                                                                        To 
             24/11/2004 19:43                                              
                                                                        cc 
                                       wsn@lists.oasis-open.org            
                                                                   Subject 
                                       Re: [wsn] Analysis of filtering     
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           




I think this may be another case where we should define the semantics in
terms of some naive evaluation algorithm, with the understanding that the
implementation doesn't have to be naive.  The one that springs to mind is
just to evaluate each filter, in order.  If order matters, the user can
control it directly.  If it doesn't the NP can take that into account in
its optimizations.

For example, the built-in filters commute, so if the filter consists only
of built-in filters, the NP knows it can re-arrange them however it likes
for performance.

On the other hand, if dmhFilter doesn't commute (or the NP can't easily
figure out whether it does or not), then the NP will have to drop back to a
particular order.

Even then, it has some freedom.  If the filter looks like

<filter>
    <topic/>
    <selector/>
    <dmhFilter/>
    <precondition/>
</filter>

it can still evaluate topic and selector in either order, even if it knows
nothing at all about dmhFilter.

It may well be that in the corner cases, where there is a non-standard
filter and an NP that doesn't know much about it, we lose performance.
(Even then, why is the NP accepting a filter it knows nothing about?)  But
we needn't lose it in the usual case, even if we specify in-order
evaluation semantics.  The NP can do whatever it wants as long as the
results are equivalent to the specified semantics (we might want to say
that specifically).

Patil, Sanjay wrote:

      Where do we draw the line? Defining nesting may not be so hard for
      us, but the specification is not exactly aimed at ourselves  :-)

      There are all sorts of possibilities for composing filters ranging
      from defining simple commutative filters (which is what we should
      include in the spec, IMHO), all the way to someone deploying entire
      business rules in the form of filters. One may possibly compose rules
      for detecting when a particular IT event qualifies as a Sarbanes
      Oxley material event and put these rules in the form of filters in
      the Subscription! What is the complexity of such rules language - I
      don't know. Do we want to explicitly prohibit such usage - I don't
      think so either.

      So basically, if we bake in simple filters and simple rules for their
      composition in the core spec, we will have a simple and fairly useful
      spec for majority of the use cases and we will still leave the door
      open for unlimited extensibility for those advanced users out there.

      Just my 2 cents.

      Thanks,
      Sanjay
       -----Original Message-----
       From: David Hull [mailto:dmh@tibco.com]
       Sent: Wednesday, Nov 24, 2004 7:57 AM
       Cc: wsn@lists.oasis-open.org
       Subject: Re: [wsn] Analysis of filtering

       I'm not convinced that nested structure is any more complex than
       coming up with implicit rules for evaluation.  Evaluating nested
       expressions only becomes difficult when you start to deal with
       variable parameters and recursive evaluation, neither of which I'm
       proposing.  I'm talking about expressions equivalent in form to

       1 +  2 + (3 * 4)

       not

       a + b + (b*c where b = 4) where a = 1, b = 2, c = 3

       or

       f(3) where f(0) = 1 and f(n+1) = (n+1)*f(n).

       We're XML experts, right?  Does nesting really scare us? :-)

       I'm not going to go to the mat on this, but I think it's worth
       considering.  IMHO much harm has been done by trying to invent
       specialized evaluation rules in order to make syntax "nicer".

       Patil, Sanjay wrote:

             +1

             Requirement of commutativity (or any particular positioning)
             on the open content filters is too strict.

             Since we allow open content for filters, I think we are
             expected to remain open about the composition of filters also.
             At the same time, defining the nesting/composition rules for
             filters in BaseN would pull in a lot of additional complexity,
             and I don't know if we want to go there either.

             Perhaps we could have the BaseN spec
             - Define the composition rules for the three filters defined
             by BaseN (commutative, etc.)
             - Allow open content filters to also utilize the composition
             rules defined by BaseN (a compositionRule attribute on the top
             level filter element with BaseN specific default value)
             - Insert language to allow implementations (or other
             specifications) to define and use new composition rules.

             Hopefully this will provide support for interoperable filters
             out of the box (spec defined) for the majority of use cases
             and still allow extensibility for the complex ones (80-20).

             Thanks,
             Sanjay
             -----Original Message-----
             From: David Hull [mailto:dmh@tibco.com]
             Sent: Tuesday, Nov 23, 2004 13:40 PM
             To: Steve Graham
             Cc: wsn@lists.oasis-open.org
             Subject: Re: [wsn] Analysis of filtering

             Steve Graham wrote:

                   > For example, suppose dmhFilter has a "suppress all but
                   every Nth message" option.
                   > This seems like a perfectly reasonable, if arbitrary,
                   filter on its own.  But it
                   > won't commute with anything.
                   Then whoever specifies the dmhFilter MUST state that the
                   dmhFilter MUST be placed
                   last in the list of expressions to be evaluated.
             The point was that such a filter could legitimately be used
             either before or after some other filter, with differing but
             equally legitimate effects.  Restricting it to some particular
             position just de-legitimizes one of those effects, for no
             apparent reason.

             As far as I can tell, the requirement of commutativity is too
             strict.





  



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]