OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

cam message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [ubl-dev] SV: [ubl] Re: [ubl-dev] Datatype Methodology RE: [ubl-dev] SBS and Restricted Data Types


Steve, 

Right now the only way I'm aware of controlling this is thru the XML
prologue and setting UTF-8, etc. 

Like Bryan - we have found this problematic in production.  File
attachments and file names is one area where people can create a
filename on one O/S that is then not processable / gives problems -
especially persisting into the backend database (e.g. Oracle) or during
file handle opening. 

The only way we have addressed this to date is to issue manual
guidelines to submitters.  Because these characters can cause issues in
the processing at various levels - failures can occur prior to or after
the CAM step ; -) 

It's a good thought though - to add the ability to filter on character
codes via an exclusion table mechanism - that would then point up the
problem - e.g. invalid character code found in element <dataitem123>
etc.  And then a predicate applyCharacterFilter(/XPath/, filtername). 

DW


 -------- Original Message --------
Subject: Re: [ubl-dev] SV: [ubl] Re: [ubl-dev] Datatype Methodology RE:
[ubl-dev] SBS and  Restricted Data Types
From: stephen.green@systml.co.uk
Date: Tue, May 09, 2006 5:55 am
To: ubl-dev@lists.oasis-open.org, ubl@lists.oasis-open.org

Bryan, All,

This raises and interesting point. There is surely an important need
to specify in a trading agreement the character set to be used in
the documents. I wonder whether even CAM has this :-) After all, should
my application have to be able to support musical notation or
hieroglyphics
in a product description? Maybe there should be a way to specify a
subset
of a character set too (especially if it is Unicode we are talking
about).
I bet many have had problems when a character decodes to two characters
in
certain systems (e.g. the GBP sign ): not good for translation to fixed
width and/or EDI.

All the best

Steve



Quoting Bryan  Rasmussen <BRS@itst.dk>:

> I agree with not setting string length restrictions, I think it would be nice
> to have string length minimums or constraints to require some content in an
> element if the element is required, but it's not a big thing for me.
>
> Another thing though would be restricting characters that are not needed, as
> per the recommendations in http://www.w3.org/TR/unicode-xml/#Suitable
>
> I think what should be restricted is (from document):
>
> U+202A .. U+202E BIDI embedding controls
> (LRE, RLE, LRO, RLO, PDF) Strongly discouraged in [HTML 4.0]
> U+206A .. U+206B Activate/Inhibit Symmetric swapping Deprecated  in Unicode
> U+206C .. U+206D Activate/Inhibit Arabic form shaping Deprecated in Unicode
> U+206E .. U+206F Activate/Inhibit National digit shapes Deprecated in Unicode
>
> U+FFF9 .. U+FFFB Interlinear annotation characters Use ruby markup [Ruby]
> U+FEFF Byte order mark / ZWNBSP Use only as byte order mark. Use U+2060 Word
> Joiner instead of using U+FEFF as ZWNBSP
> U+FFFC Object replacement character Use markup
> U+1D173..U+1D173A Scoping for Musical Notation Use an appropriate markup
> language
> U+E0000 .. U+E007F Language Tag codepoints
>
> I don't want to restrict the use of line feeds etc. as is recommended in the
> aforementioned document.
>
> Cheers,
> Bryan Rasmussen
>




[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]