Subject: Re: [office-formula] Whitespace fixups
Hi David, On Saturday, 2008-06-14 12:20:09 -0400, David A. Wheeler wrote: > In particular, if there is an embedded newline, do we use \n or \r? > It explicitly forbids CRLF, but one place says "\n" and the other says "\r". I propose that it be \n (I believe that's what we meant), but exclusively use Unicode code points so that it's unambiguous. Seconded, use "\n" line feed, written as U+000A Unicode code point. > [... draft ...] > Applications shall consider the following characters as whitespace > characters: space (U+0020), tab (U+0009), newline (U+000A), and > carriage return (U+000D). Should we add U+00A0 NO-BREAK SPACE to this list? I don't think we'd need the other no-break space characters though. > An embedded line break shall be represented by a single newline > character (U+000A), not by a carriage return-linefeed pair. When > embedded in an XML document the newline character is typically > represented as “�A;”. Using the term typically (remember TC discussion) might need clarification on what,when,why,which,... I suggest to avoid it and simply remove the last sentence, leaving only the definition to use a newline character. So, if CR and LF are both considered whitespace characters, which I second, and an embedded line break shall be written as LF, which I second as well, this actually means that an application must accept a CRLF combination when reading a document, but not write CRLF back to a document and transform it to LF instead. What about a single CR? Should it be treated as a newline and written back as LF? Or should it be preserved, if the application supports that? Note: Applications not capable of preserving all whitespace characters should be allowed to omit them when writing a document. Eike -- Automatic string conversions considered dangerous. They are the GOTO statements of spreadsheets. --Robert Weir on the OpenDocument formula subcommittee's list.