OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

sarif message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Important subtlety in region definition (long)

Please read to the end, where I pose two questions.


The new change draft for Issue #93, “Problems with regions”, addresses concerns raised by Jim and Luke, and incorporates some nice design suggestions from Jim.


The spec now says that a single region object can represent both a “text region” (a contiguous sequence of characters) and a “binary region” (a contiguous sequence of bytes), using separate sets of properties. And the spec says that if a region object represents both a text region and a binary region, then the text-related properties and the binary-related properties must represent exactly the same range of bytes.


The spec does not allow you to specify a region by a mixture of text- and binary-related properties. For example, consider a UTF-16 file with no BOM and contents "abcde\r\n", and consider a region that includes the characters "bcd". The spec allows you to represent this region in many ways, such as:


Text-related line/column properties:


{ "startLine": 1, "startColumn": 2, "endColumn": 5 }


Text- related offset/length properties:


{ "charOffset": 1, "charLength": 3 }


A mixture of text- related line/column and offset/length properties:


{ ""startLine": 1, "startColumn": 2, "charLength": 3 }


Binary-related offset/length properties:


{ "byteOffset": 2, "byteLength": 6 }


But the spec does not allow this:


{ "startLine": 1, "byteOffset": 2, "byteLength": 6 }  # INVALID


I could have written the spec to allow this, but I chose not to, for simplicity. The spec already has paragraphs of text and dozens of examples illustrating valid combinations of the text-related properties alone. I judged that it would be too difficult to express, and too difficult for implementers to understand and implement correctly, language that attempted to enumerate all legal combinations of text-related and binary-related properties. Instead, I required each set of properties to stand alone, and for them to be consistent.


As the spec stands, that mixed region above would be equivalent to:



  "startLine": 1,

  "startColumn": 1,             // Missing startColumn defaults to 1.

  "endLine": 1,                 // Missing endLine defaults to startLine.

  "endColumn": 6,               // Missing endColumn defaults to (length of endLine) + 1, exclusive of newline sequence.


  "byteOffset": 2

  "byteLength": 6



… and now the text-related properties and the byte-related properties represent different byte ranges.


My two questions are:


  1. Do you agree with my proposal to treat the text-related properties and binary-related properties separately?

  2. If so, should I state that explicitly, and give this example?


*sigh* Having written all this, I guess the answer to #2 has to be “Yes” if the answer to #1 is “Yes”.




[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]