OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

relax-ng message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Subject: RE: [relax-ng] Issue: literal newlines in strings


Right now Trang will serialize the following literal string

[ a:documentation [ "The following
element is new
in version 0.3" ]

to

<a:documentation>The following
element is new
in version 0.3</a:documentation>

I would have thought that this literal would undergo some form of
normalization, and be serialized as

<a:documentation>The following element is new in version
0.3</a:documentation>

I'd like to be able to force a newline in output by using an escaped
newline; otherwise, I'd like things to be normalized.

Mike
-----Original Message-----
From: James Clark [mailto:jjc@jclark.com]
Sent: Monday, April 08, 2002 10:24 PM
To: RELAX NG Mailing List
Subject: [relax-ng] Issue: literal newlines in strings


The compact syntax currently allows newlines in strings.  For example,

"ab
cd"

is legal and is equivalent to "ab\x{A}cd" (assuming we accept the \x
proposal).

Many modern programming languages don't allow literal newlines in strings.
For example, in Java, you cannot write

"ab
cd"

Rather you have to write "ab\ncd".  One reason is to support better error
reporting/recoovery.  If you allow literal newlines in string literals, then
a missing closing delimiter may only be detected many lines further on.
Another reason is that allowing newlines in strings makes it harder to do
provide editing support: you can't detect whether a particular point is in a
string without parsing from the beginning of the document.

I'm not sure whether this is sufficient justification to prohibit literal
newlines in strings, but I think it's worth considering.

Note that if we did prohibit them, the parsing phase that handles \x escapes
would need to generate not simply a sequence of Unicode characters, but
rather a sequence of Unicode characters and line terminators.  (This is easy
to implement, since the line terminator can be represented internally by one
of the Unicode code points that XML disallows.)

James



----------------------------------------------------------------
To subscribe or unsubscribe from this elist use the subscription
manager: <http://lists.oasis-open.org/ob/adm.pl>



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Powered by eList eXpress LLC