OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

wsia message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]

Subject: RE: [wsrp] RE: [wsia] [wsrp-interfaces] Draft spec v0.5

Title: Message
The algorithm you described, Eilon, is (roughly) the Knuth-Pratt-Morris string searching algorithm, which looks at every character in the searched string, and Carsten was suggesting that the current token in the spec would make it possible to use the Bayer-Moore algorithm, which skips some characters in the searched string  (http://www.cs.utexas.edu/users/moore/best-ideas/string-searching/index.html has an interesting comparison of the two algorithms from Moore himself).
However, I don't quite understand your arguments for the token, Carsten.  You said:
    If we allow UNICODE characters as the input set, this table becomes incredibly large (too large to be useful ;-)
Since the markup is internationalizable and embedded in XML, aren't we forced to have Unicode characters as the input set?  I don't see how making the search string all ASCII helps here. 
Second, I would assume that most implementations of WSRP would feed the SOAP message to an XML parser, so  we wouldn't have to do any more searching if we used an XML element to denote consumer rewrite URLs.  With a token, most implementations would parse the SOAP message twice: once for the XML DOM, and once to find the tokens. 
Third, even if an implementation doesn't use an XML parser, I don't understand why an XML element couldn't be found by the Bayer-Moore algorithm as well.  Perhaps you are arguing that the token you proposed is most likely to approach O(n/m) because of the rarity of the string, but that seems like pretty implementation-specific performance tuning for the spec.
Finally, I'm skeptical of trading off performance for spec clarity in this case.  It seems to me that the relative performance of token parsing for URL rewriting is not going to be the limiting factor when sending a SOAP XML payload over HTTP.
-----Original Message-----
From: Eilon Reshef [mailto:eilon.reshef@webcollage.com]
Sent: Friday, September 06, 2002 9:04 AM
To: 'Carsten Leue'
Cc: wsia@lists.oasis-open.org; 'WSRP (E-mail)'; wsrp-interfaces@lists.oasis-open.org
Subject: RE: [wsrp] RE: [wsia] [wsrp-interfaces] Draft spec v0.5

I would suspect that the fastest algorithm in this case would be a set of "if" statements, e.g.,
for (int i = 0; i < length - 10; i++)
  if (text[i] == 'X')
    if (text[i] == 'g')
       if (text[i] == ...)
         return something;
This should work with practically any token that's doesn't have repeating characters.
In any case, I do agree with Sasha that the current token looks more like a typing mistake than anything else, and we should consider something that makes a bit more sense to developers, like an easily articulated prefix (wsia/wsrp), a pre-defined character (such as Sasha suggested) or a pre-defined tag (e.g., <consumer-link>).
-----Original Message-----
From: Carsten Leue [mailto:CLEUE@de.ibm.com]
Sent: Friday, September 06, 2002 5:17 AM
To: Sasha Aickin
Cc: Rich Thompson; wsia@lists.oasis-open.org; WSRP (E-mail); wsrp-interfaces@lists.oasis-open.org
Subject: [wsrp] RE: [wsia] [wsrp-interfaces] Draft spec v0.5

Hi Sasha.

1. I think that the current intend is to return the markup as a string. In
this case it would need to be XML encoded (your first example). I also see
this preferable to requirng the markup to be XML conformant.
2. When looking for an appropriate URL rewriting marker I decided to use
plain ASCII charaters for the following reason: the fastest way to locate a
token to my knowledge is the Boyer-Moore algorithm if the token consists of
improbable characters (O(N/M), see the summary I sent out earlier). This
algorithm implies (at least any implementation I know of) that the parser
holds a table that contains the number of characters the algorithm can skip
per character to consider. If we allow UNICODE characters as the input set,
this table becomes incredibly large (too large to be useful ;-).
If I think again then maybe it makes sense to replace this map by a sparse
map implementation. Do you have experience with such an approach?

Best regards
Carsten Leue

Dr. Carsten Leue
Dept.8288, IBM Laboratory B÷blingen , Germany
Tel.: +49-7031-16-4603, Fax: +49-7031-16-4401

|         |           Sasha Aickin     |
|         |           <AlexanderA@plumt|
|         |           ree.com>         |
|         |                            |
|         |           09/04/2002 11:02 |
|         |           PM               |

  |                                                                                                                               |

  |       To:       Rich Thompson/Watson/IBM@IBMUS, wsia@lists.oasis-open.org, wsrp-interfaces@lists.oasis-open.org, "WSRP        |

  |        (E-mail)" <wsrp@lists.oasis-open.org>                                                                                  |

  |       cc:                                                                                                                     |

  |       Subject:  RE: [wsia] [wsrp-interfaces] Draft spec v0.5                                                                  |

  |                                                                                                                               |

  |                                                                                                                               |


I have a question about getMarkup.  Do we expect the markup sent back from
the method to be XML-encoded or not?  That is, let's say we want to return
the markup "Hello, <strong>World</strong>".  Does the SOAP response look

<markup xsi:type="xsd:string">Hello,

or like:

<markup xsi:type="markup">Hello, <strong>World</strong></markup>

As currently specified, only the former possibility is allowed, since
markup is specified with type xsd:string.  I think this is great, because
it will allow us to send non-XML conformant documents like HTML.  Further,
if we do decide that markup will be sent as an XML-encoded string, then we
could use XML elements to specify Consumer side URL rewriting, like so:

<markup xsi:type="xsd:string">Please click &lt;a href="&quot<wsia:rewriteURL
type="Action" navigationState="somestate"/>&quot;

If we don't decide to have Consumer URL writing use XML elements, though, I
think there's another solution that might be more effective than the
"wsia:QXqKYZJVUWj7G" token.  We could use Unicode characters from one of
the "private use" areas of Unicode (U+E000-U+F8FF, U+F0000-U+FFFFD, or
U+100000-U+10FFFD).  These characters are specifically designed for private
use by applications, and you can easily put them into XML with character
entities (e.g. &#xE123;).  Further, since ampersand has an escape character
in XML, you can specify when you send portlet content whether you are
sending the actual control character or text that should display as the
control character (e.g. &amp;#xE123; vs. &#xE123;).


-----Original Message-----
From: Rich Thompson [mailto:richt2@us.ibm.com]
Sent: Friday, August 30, 2002 10:19 AM
To: wsia@lists.oasis-open.org; wsrp-interfaces@lists.oasis-open.org
Subject: [wsia] [wsrp-interfaces] Draft spec v0.5

The doc file has now grown to 11MB ....

(See attached file: Draft Spec v0.5.ZIP)

To subscribe or unsubscribe from this elist use the subscription
manager: <http://lists.oasis-open.org/ob/adm.pl>

To subscribe or unsubscribe from this elist use the subscription
manager: <http://lists.oasis-open.org/ob/adm.pl>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]

Powered by eList eXpress LLC