OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

odata message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: [OASIS Issue Tracker] (ODATA-1348) CSDL MaxLength is ill-defined

Evan Ireland created ODATA-1348:

             Summary: CSDL MaxLength is ill-defined
                 Key: ODATA-1348
                 URL: https://issues.oasis-open.org/browse/ODATA-1348
             Project: OASIS Open Data Protocol (OData) TC
          Issue Type: Bug
          Components: CSDL JSON , CSDL XML
    Affects Versions: V4.0_OS
            Reporter: Evan Ireland

7.2.2 MaxLength

  "A positive integer value specifying the maximum length of a binary, stream or string value. For binary or stream values this is the octet length of the binary data, for string values it is the character length."

What does character mean here? (Unicode specs don't define character in any normative text).

3.3 Primitive Types

"Edm.String Sequence of UTF-8 characters"

If we combine 7.2.2 and 3.3, we might reasonably infer that MaxLength is the maximum valid length of a String value in UTF-8 encoding.

Is this what the spec intended, in which case 7.2.2 should be clarified, or was it intended that 7.2.2 refer to UTF-16 code points or Unicode code points?

See also: https://unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries

Why does any of this matter? Consider a client, that wants to create an offline cache of data from a server (in a database, where columns need a specified maximum length). Or consider some other intermediary, which wants to allocate space for a buffer (e.g. malloc MaxLength+1 for a buffer to hold a Property value in a C program). It is important for such apps to be able to determine how much space to set aside to avoid accidental truncation of values. 

Additionally, any client or other agent wishing to do validation of a Property value according to MaxLength, it makes huge difference whether this is done by UTF-8, UTF-16 or Unicode code points.

This message was sent by Atlassian Jira

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]