office message

Subject: [OASIS Issue Tracker] Issue Comment Edited: (OFFICE-3356) PublicComment: Please standardize "Line Start Prohibition Rule"
From: OASIS Issues Tracker <workgroup_mailer@lists.oasis-open.org>
To: office@lists.oasis-open.org
Date: Mon, 4 Oct 2010 15:14:15 -0400 (EDT)

    [ http://tools.oasis-open.org/issues/browse/OFFICE-3356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=21839#action_21839 ] 

Dennis Hamilton edited comment on OFFICE-3356 at 10/4/10 3:12 PM:
------------------------------------------------------------------

[Corrected two items]

This seems silly to me.  

 1. OpenFormula deals with Unicode code points.  Do you worry about BiDi start and stop codes in strings too?  I can stick all manner of non-spacing and gluing codes into a string in OpenFormula and I would simply expect them to be carried around, counted as code points, etc.  There are elaborate character-composition provisions in Unicode that involve multiple code points and there is no way to normalize them to a single code point.  I expect that OpenFormula will simply let such strings be.  I see no reason to adjust OpenFormula string handling because I inserted a literal format effector of any kind.  

 2. Finally, if there is a ZERO WIDTH NO-BREAK SPACE, the semantics are perfectly clear for a layout processor that recognizes the code (and ignores it otherwise).  If there is some sort of out-of-band special arrangement for infering its own NO-BREAK conditions, all it has to do is honor any that are explicity there, whether on its list or not.  (The Unicode specification is pretty clear about all of this.)  Whether it retains them on editing and even allows them to be inserted manually is a different story, although it would be difficult to prevent it if there is any kind of full-up Unicode support available to users.

I don't think anyone worries about &nbsp; (U+00A0) screwing up this stuff.  I assume that it would be honored.  It is in HTML, after all :).

 3. Finally, I must point out that [XML1.0] explicitly includes the codes I listed, and many more, in its syntax for Char in [XML1.0] section 2.2 and they are *not* listed in the ones to be discouraged in the Note of that section.  So they are already available for ODF consumers to have to deal with.  They are presumably usable anywhere there is an RNG <text /> pattern, along with NEL (NEXT LINE, U+0085), LS (LINE SEPARATOR, U+2028), and PS (PARAGRAPH SEPARATOR, U+2029).

4. It is interesting that all of these codes except U+0083 are allowed in [RFC3987] IRIs too.

      was (Author: orcmid):
    This seems silly to me.  

 1. OpenFormula deals with Unicode code points.  Do you worry about BiDi start and stop codes in strings too?  I can stick all manner of non-spacing and gluing codes into a string in OpenFormula and I would simply expect them to be carried around, counted as code points, etc.  There are elaborate character-composition provisions in Unicode that involve multiple code points and there is no way to normalize them to a single code point.  I expect that OpenFormula will simply let such strings be.  I see no reason to adjust OpenFormula string handling because I inserted a literal format effector of any kind.  

 2. Finally, if there is a ZERO WIDTH NO-BREAK SPACE, the semantics are perfectly clear for a layout processor that recognizes the code (and ignores it otherwise).  If there is some sort of out-of-band special arrangement for infering its own NO-BREAK conditions, all it has to do is honor any that are explicity there, whether on its list or not.  (The Unicode specification is pretty clear about all of this.)  Whether it retains them or allows them to be inserted manually is a different story.

I don't think anyone worries about &nbps; (U+00A0) screwing up this stuff.  

 3. Finally, I must point out that [XML1.0] explicitly includes the codes I listed, and many more, in its syntax for Char in [XML1.0] section 2.2 and they are *not* listed in the ones to be discouraged in the Note of that section.  So they are already available for ODF consumers to have to deal with.  They are presumably usable anywhere there is an RNG <text /> pattern, along with NEL (NEXT LINE, U+0085), LS (LINE SEPARATOR, U+2028), and PS (PARAGRAPH SEPARATOR, U+2029).

4. It is interesting that all of these codes except U+0083 are allowed in [RFC3987] IRIs too.
  
> Public Comment: Please standardize "Line Start Prohibition Rule"
> ----------------------------------------------------------------
>
>                 Key: OFFICE-3356
>                 URL: http://tools.oasis-open.org/issues/browse/OFFICE-3356
>             Project: OASIS Open Document Format for Office Applications (OpenDocument) TC
>          Issue Type: Bug
>          Components: Needs Discussion, Public Review, Text
>    Affects Versions: ODF 1.2 CD 05
>            Reporter: Robert Weir 
>            Priority: Blocker
>             Fix For: ODF 1.2 CD 06
>
>
> Copied from office-comment list
> Original author: "MURATA Makoto (FAMILY Given)" <eb2m-mrt@asahi-net.or.jp> 
> Original date: 16 Aug 2010 09:49:25 -0000
> Original URL: http://lists.oasis-open.org/archives/office-comment/201008/msg00004.html
> Text copied for convenience in searching JIRA (note that Unicode may be garbled):
> """
> Open Office appear to support "line-start prohibition rule", and even
> allows document authors to specify which character is prohibited.
> This rule is considered very important in Japan, China, Taiwan, and
> Korea.
> http://www.w3.org/TR/2009/NOTE-jlreq-20090604/#en-subheading2_1_7
> If you explicitly specify which character is prohibited, the following
> is generated by OO.o as part of setting.xml.
>       <config:config-item-map-indexed config:name="ForbiddenCharacters">
>         <config:config-item-map-entry>
>           <config:config-item config:name="Language" config:type="string">ja</config:config-item>
>           <config:config-item config:name="Country" config:type="string">JP</config:config-item>
>           <config:config-item config:name="Variant" config:type="string"/>
>           <config:config-item config:name="BeginLine" config:type="string"
>             >!%),.:;?]}Â¢Â°â€™â€â€°â€²â€³â„ƒã€ã€‚ã€...ã€‰ã€‹ã€ã€ã€'ã€•ããƒã...ã‡ã‰ã£ã‚ƒã‚...ã‚‡ã‚Žã‚›ã‚œã‚ã‚žã‚¡ã‚£ã‚¥ã‚§ã‚©ãƒƒãƒ£ãƒ¥ãƒ§ãƒ®ãƒµãƒ¶ãƒ»ãƒ¼ãƒ½ãƒ¾ï¼ï¼...ï¼‰ï¼Œï¼Žï¼šï¼›ï¼Ÿï¼½ï½ï½¡ï½£ï½¤ï½¥ï½§ï½¨ï½©ï½ªï½«ï½¬ï½ï½®ï½¯ï½°ï¾žï¾ŸÂ¢</config:config-item>
>           <config:config-item config:name="EndLine" config:type="string"
>             >$([Â¥{Â£Â¥â€˜â€œã€ˆã€Šã€Œã€Žã€ã€"ï¼„ï¼ˆï¼»ï½›ï½¢Â£ï¿¥</config:config-item>
>         </config:config-item-map-entry>
> However, their semantics is not at all described in ODF 1.0.  It is not 
> described in the current draft of ODF 1.2 either.  Please standardize
> them.  Otherwise, ODF 1.2 does not contribute to interoperability 
> of Japanese text.
> """

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://tools.oasis-open.org/issues/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira