[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [xacml] regex in the spec
On Tue, 2004-05-04 at 16:56, Bill Parducci wrote: > ok, i see what you are using. the problem i have here is that xml schema uses a > *subset* of an externally referenced regular expression definition (Unicode > Regular Expression Guidelines, Level 1) that does not meet the needs of a > general regular expression mechanism (which is why xquery has provided for > additions to the xml schema regex syntax to achieve its goals). I don't think it matters what XMLSchema uses. If we're defining a schema, and we're putting a regular expression in that schema, then we use the XMLSchema format. Period. This way people can use validating parsers to automatically verify the contents of their instance documents. Or am I missing the point here? > > As for the specifics of > > the pattern, I don't care too much about how we form it. I used the > > current string because it's the most common way to phrase something like > > this in XMLSchema, therefore I believe it will be the most accessible. > > for you maybe ;o) the xml schema spec provides examples and definitions that use > the syntax i proposed (e.g. the sections on Lexical & Canonical Representations > and the concept of patterns themselves). this is not to suggest that we must do > this (the same spec has an example using the more advanced pattern string > notation), but that there is no mention of a preference one way or the other. I'm not talking about a preference in the spec. I'm talking about a preference that I see in the real world, time and time again. Also, using \d+ rather than [0-9]+ gives you a lot of things for free like correct unicode handling. Like I said before, however, I'm not too worried about this specific detail. > i also maintain that the expanded numeric notation ('[0-9]+' vs '\d')works on > ANY system that supports regular expressions. again, a preferential position > rather than strict adherence to the specification referenced, but not without > merit; an optimum solution should allow for either. First of all, most regexp systems people use today support \d. Second, this pattern isn't being written for anything except the schema validation engine. It's a trivial pattern, and if you need to support it for some reason in your code, then you're probably going to do your own handling. Finally, we should think about who will actually look at this pattern and care how it's formed: my claim is that I provided the correct form for our audience. > > The idea here is _not_ just to provide a direct match. You should look > > at the text I added to go with this that explains exactly what this > > pattern is used for. This is a very simple wildcard that only lets you > > form a few patterns. Here we wouldn't want to use a full-featured regexp > > language, since we only want people to say one of a few things: > > > > 1.2.4 > > 1.+ > > 1.*.4 > > 1.2.* > > > which all match the string "1.2.4.". I know of no existing language we > > could reference that only provides these limited options. > > i don't understand this. the syntax you provided for version number: > > (\d+\.)*\d+ > > allows the following: > > 1 > 1.2 > 1.2.3.4.5.6.7.8 > (num.num.num...) > > why not rely simply upon the same regular expression reference that was used to > construct it? Bill, have you read the text that I submitted? It explains the goal of these two strings. One pattern defines a version number. Another defines a simple matching expression. The idea is that in a reference, you can say "match this number" or "any version less than this version." The second pattern defines how you form a matching string. > what are we protecting people from? We're not protecting people from anything. We're just making sure that you can only use numbers, and that you can only do a few clear operations which preserve a notion of equality, less than, and greater than for version numbers. If we just said "use the following full-featured regexp language" then we could never get these properties. > if you can only write version information in the form: > > + > +.+ > +.+ > ... > > who cares how verbose the matching expression is (so long as it conforms to some > common general definition)? I think you're missing the whole point of this feature. You write a version number as "1.2.4" and then in a refernce you say "all versions after 1.+". That's what the two patterns are providing. If you could use any arbitrary regep string, then you can't talk about less than or greater than, just equal to. > > The pattern strings are written in XMLSchema, and they express > > very specific, very (intentionally) limited meaning that I provided > > clear text to explain. > > again, i suggest that this imposes restrictions that are not tangibly > beneficial. while you may find the 'intentional' limitation comforting, it may > require that others perform an additional operation to match this format, even > though those systems write and read regular expressions that are 100% with the > xml schema specification you are citing. If my above comments haven't convinced you already, then I'm not going to make any more progress here :) > > Do you have something in mind that you think would be more > > appropriate here? > > two things: > > 1. i would prefer that we directly reference the regular expression semantics > that the current version of xml schema *refers* to. this would add a reference > into our spec: > > Unicode Regular Expression Guidelines > Mark Davis. Unicode Regular Expression Guidelines, 1988. Available at: > http://www.unicode.org/unicode/reports/tr18/ > > OR > > reference the xquery regular expression [xml schema] superset so as to be > consistent with our functions: > > Regular Expression Syntax > W3. XQuery 1.0 and XPath 2.0 Functions and Operators > http://www.w3.org/TR/xpath-functions/#regex-syntax > > either would require a simple statement on the use of regular expressions in > *general* (i will write it if the idea is acceptable). given the high level of > dependency upon xquery in our spec, perhaps the latter, while limited--and fluid > based on its dependence upon xml schema, which itself depends upon the reference > above--would offer the better solution. I don't see what this gains us, other than adding another reference. Besides, we already have a direct reference to the second item you cite. > 2. remove restrictions on the syntax of the matching patterns for version > comparisons. the definition of the version number provides a consistent numeric > structure and how an implementer decides to check that it matches should only be > constrained by the format of the data itself (and the use of explicit regular > expression semantics). First of all, I'll refer to my previous comment about making it easy to talk about comparions on version strings. Second, I've done a lot of thinking about how this feature will get used, and I've talked with a lot of people who want this kind of feature. In practice, I don't think that most people will use a version pattern to do a direct match on a string. Instead, I think they'll do something like what we do with targets today, decomposing version numbers and using them to index policies. As such, a simple string that a programmer can intrepret on their own is key. If we use a standard regexp expression system, no one will be able to simply decompose version information to find applicable policies. > i suspect that we will be seeing the use of regular expressions more frequently > as the spec matures and i believe that version will likely end up setting a > precedent for how this is done in the future. i would prefer that we not get in > the habit of further subsetting the syntax of the matching patterns for reasons > of flexibility and effort. I don't think that this one feature has anything to do with how we may or may not do more regexp in the future. More to the point, I've seen no evidence that we'll want to add more regexp features to the spec (if we want this, why hasn't someone proposed it?). I proposed one very specific feature, with an intentionally simple matching rule designed so that programmers can decompose the pattern to find policies. This is not designed to be a general pattern-matching feature, and doesn not need to have that complexity. seth
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]