xacml message

Subject: Re: [xacml] regex in the spec
From: Seth Proctor <Seth.Proctor@Sun.COM>
To: Bill Parducci <bill.parducci@overxeer.com>
Date: Wed, 05 May 2004 12:24:45 -0400
On Tue, 2004-05-04 at 16:56, Bill Parducci wrote:
> ok, i see what you are using. the problem i have here is that xml schema uses a 
> *subset* of an externally referenced regular expression definition (Unicode 
> Regular Expression Guidelines, Level 1) that does not meet the needs of a 
> general regular expression mechanism (which is why xquery has provided for 
> additions to the xml schema regex syntax to achieve its goals).

I don't think it matters what XMLSchema uses. If we're defining a
schema, and we're putting a regular expression in that schema, then we
use the XMLSchema format. Period. This way people can use validating
parsers to automatically verify the contents of their instance
documents. Or am I missing the point here?

> > As for the specifics of 
> > the pattern, I don't care too much about how we form it. I used the 
> > current string because it's the most common way to phrase something like 
> > this in XMLSchema, therefore I believe it will be the most accessible.
> 
> for you maybe ;o) the xml schema spec provides examples and definitions that use 
> the syntax i proposed (e.g. the sections on Lexical & Canonical Representations 
> and the concept of patterns themselves). this is not to suggest that we must do 
> this (the same spec has an example using the more advanced pattern string 
> notation), but that there is no mention of a preference one way or the other.

I'm not talking about a preference in the spec. I'm talking about a
preference that I see in the real world, time and time again. Also,
using \d+ rather than [0-9]+ gives you a lot of things for free like
correct unicode handling. Like I said before, however, I'm not too
worried about this specific detail.

> i also maintain that the expanded numeric notation ('[0-9]+' vs '\d')works on 
> ANY system that supports regular expressions. again, a preferential position 
> rather than strict adherence to the specification referenced, but not without 
> merit; an optimum solution should allow for either.

First of all, most regexp systems people use today support \d. Second,
this pattern isn't being written for anything except the schema
validation engine. It's a trivial pattern, and if you need to support it
for some reason in your code, then you're probably going to do your own
handling. Finally, we should think about who will actually look at this
pattern and care how it's formed: my claim is that I provided the
correct form for our audience.

> > The idea here is _not_ just to provide a direct match. You should look 
> > at the text I added to go with this that explains exactly what this 
> > pattern is used for. This is a very simple wildcard that only lets you 
> > form a few patterns. Here we wouldn't want to use a full-featured regexp 
> > language, since we only want people to say one of a few things:
> > 
> >   1.2.4
> >   1.+
> >   1.*.4
> >   1.2.*
> 
> > which all match the string "1.2.4.". I know of no existing language we 
> > could reference that only provides these limited options.
> 
> i don't understand this. the syntax you provided for version number:
> 
>   (\d+\.)*\d+
> 
> allows the following:
> 
>    1
>    1.2
>    1.2.3.4.5.6.7.8
>    (num.num.num...)
> 
> why not rely simply upon the same regular expression reference that was used to 
> construct it?

Bill, have you read the text that I submitted? It explains the goal of
these two strings. One pattern defines a version number. Another defines
a simple matching expression. The idea is that in a reference, you can
say "match this number" or "any version less than this version." The
second pattern defines how you form a matching string.

> what are we protecting people from?

We're not protecting people from anything. We're just making sure that
you can only use numbers, and that you can only do a few clear
operations which preserve a notion of equality, less than, and greater
than for version numbers. If we just said "use the following
full-featured regexp language" then we could never get these properties.

> if you can only write version information in the form:
> 
> +
> +.+
> +.+
> ...
> 
> who cares how verbose the matching expression is (so long as it conforms to some 
> common general definition)?

I think you're missing the whole point of this feature. You write a
version number as "1.2.4" and then in a refernce you say "all versions
after 1.+". That's what the two patterns are providing. If you could use
any arbitrary regep string, then you can't talk about less than or
greater than, just equal to.

> > The pattern strings are written in XMLSchema, and they express 
> > very specific, very (intentionally) limited meaning that I provided 
> > clear text to explain.
> 
> again, i suggest that this imposes restrictions that are not tangibly 
> beneficial. while you may find the 'intentional' limitation comforting, it may 
> require that others perform an additional operation to match this format, even 
> though those systems write and read regular expressions that are 100% with the 
> xml schema specification you are citing.

If my above comments haven't convinced you already, then I'm not going
to make any more progress here :)

>  > Do you have something in mind that you think would be more
>  > appropriate here?
> 
> two things:
> 
> 1. i would prefer that we directly reference the regular expression semantics 
> that the current version of xml schema *refers* to. this would add a reference 
> into our spec:
> 
> Unicode Regular Expression Guidelines
>    Mark Davis. Unicode Regular Expression Guidelines, 1988. Available at:
>    http://www.unicode.org/unicode/reports/tr18/
> 
> OR
> 
> reference the xquery regular expression [xml schema] superset so as to be 
> consistent with our functions:
> 
> Regular Expression Syntax
>    W3. XQuery 1.0 and XPath 2.0 Functions and Operators
>    http://www.w3.org/TR/xpath-functions/#regex-syntax
> 
> either would require a simple statement on the use of regular expressions in 
> *general* (i will write it if the idea is acceptable). given the high level of 
> dependency upon xquery in our spec, perhaps the latter, while limited--and fluid 
> based on its dependence upon xml schema, which itself depends upon the reference 
> above--would offer the better solution.

I don't see what this gains us, other than adding another reference.
Besides, we already have a direct reference to the second item you cite.

> 2. remove restrictions on the syntax of the matching patterns for version 
> comparisons. the definition of the version number provides a consistent numeric 
> structure and how an implementer decides to check that it matches should only be 
> constrained by the format of the data itself (and the use of explicit regular 
> expression semantics).

First of all, I'll refer to my previous comment about making it easy to
talk about comparions on version strings. Second, I've done a lot of
thinking about how this feature will get used, and I've talked with a
lot of people who want this kind of feature. In practice, I don't think
that most people will use a version pattern to do a direct match on a
string. Instead, I think they'll do something like what we do with
targets today, decomposing version numbers and using them to index
policies. As such, a simple string that a programmer can intrepret on
their own is key. If we use a standard regexp expression system, no one
will be able to simply decompose version information to find applicable
policies.

> i suspect that we will be seeing the use of regular expressions more frequently 
> as the spec matures and i believe that version will likely end up setting a 
> precedent for how this is done in the future. i would prefer that we not get in 
> the habit of further subsetting the syntax of the matching patterns for reasons 
> of flexibility and effort.

I don't think that this one feature has anything to do with how we may
or may not do more regexp in the future. More to the point, I've seen no
evidence that we'll want to add more regexp features to the spec (if we
want this, why hasn't someone proposed it?). I proposed one very
specific feature, with an intentionally simple matching rule designed so
that programmers can decompose the pattern to find policies. This is not
designed to be a general pattern-matching feature, and doesn not need to
have that complexity.


seth
References:
- regex in the spec
  - From: Bill Parducci <bill.parducci@overxeer.com>
- Re: [xacml] regex in the spec
  - From: seth proctor <Seth.Proctor@Sun.COM>
- Re: [xacml] regex in the spec
  - From: Bill Parducci <bill.parducci@overxeer.com>