OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

xliff-omos message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [xliff-omos] A few notes in JSON and inline content


Hi all,

 

I think some of the confusion comes from the example.

 

Ryan’s example is:

 

Alert: <h1>[productname name="Acme Phone"] now available in stores.</h1> Get it today!

 

But, to me, it’s not one that can be easily used for two reasons:

 

-   [productname name="Acme Phone"] is not standard HTML, and while we all understand how this should be extracted (as a sub-flow), it would be better to use an equivalent “normal” HTML construct like an alt attribute in an img element, so any HTML filter can generate the XLIFF without being side-tracked on how the special code is handled.

-   Also <h1> is a structural code (i.e. not part of the “inline codes” as defined by HTML5 itself (phrasing content)), but the extraction representation provided treats it as inline.

 

We could use a standard HTML snippet that can illustrate a similar extraction, but using a “normal” HTML filter:

 

Alert: <strong><img src='' alt="Acme Phone"> now available in stores.</strong> Get it today!

 

This would give us an XLIFF similar to this:

 

<?xml version="1.0"?>

<xliff xmlns="urn:oasis:names:tc:xliff:document:2.0" version="2.0" srcLang="en-us" trgLang="fr-fr">

<file id="f1" original="/C:/Users/ysavourel/Desktop/test.html">

  <unit id="tu2" type="okp:alt">

   <segment>

    <source>Acme Phone</source>

   </segment>

  </unit>

  <unit id="tu1">

   <originalData>

    <data id="d1">&lt;strong></data>

    <data id="d2">[#$dp1]</data>

    <data id="d3">&lt;/strong></data>

   </originalData>

   <segment>

    <source>Alert: <pc id="1" canCopy="no" canDelete="no" dataRefEnd="d3" dataRefStart="d1"><ph id="2" canCopy="no" canDelete="no" subFlows="tu2" dataRef="d2"/> now available in stores.</pc> Get it today!</source>

   </segment>

  </unit>

</file>

</xliff>

 

And a possible JSON output like this (for the whole document):

 

-   Obviously some of the fields that have default values may be omitted (if we decided so).

-   Some fileds are just arbitrary choices we would have to decide on (like xmlspace:false, would probably be preserveWS:false or xmlspace:'default')

 

 

{

     "version": "2.0",

     "srcLang": "en-us",

     "xmlspace": false,

     "trgLang": "fr-fr",

     "files": [{

           "id": "f1",

           "original": "\/C:\/Users\/ysavourel\/Desktop\/test.html",

           "canResegment": true,

           "xmlspace": false,

           "srcDir": "auto",

           "trgDir": "auto",

           "items": [{

                "kind": "unit",

                "id": "tu2",

                "type": "okp:alt",

                "xmlspace": false,

                "srcDir": "auto",

                "trgDir": "auto",

                "parts": [{

                     "seg": true,

                     "canResegment": true,

                     "source": ["Acme Phone"]

                }]

           }, {

                "kind": "unit",

                "id": "tu1",

                "xmlspace": false,

                "srcDir": "auto",

                "trgDir": "auto",

                "parts": [{

                     "seg": true,

                     "canResegment": true,

                     "source": ["Alert: ", {

                           "kind": 0,

                           "id": "1",

                           "data": "<strong>",

                           "canCopy": false,

                           "canDelete": false

                     }, {

                           "kind": 2,

                           "id": "2",

                           "data": "[#$dp1]",

                           "subFlows": "tu2",

                           "canCopy": false,

                           "canDelete": false

                     }, " now available in stores.", {

                           "kind": 1,

                           "id": "1",

                           "data": "<\/strong>",

                           "canCopy": false,

                           "canDelete": false

                     }, " Get it today!"]

                }]

           }]

     }]

}

 

Cheers,

-yves

 

 

 

 

 

From: Chase Tingley [mailto:chase@spartansoftwareinc.com]
Sent: Tuesday, February 16, 2016 12:45 PM
To: Yves Savourel <ysavourel@enlaso.com>
Cc: XLIFF OMOS TC <xliff-omos@lists.oasis-open.org>
Subject: Re: [xliff-omos] A few notes in JSON and inline content

 

A small thing that's been bothering me about the last few messages in this thread is that there's something strange about Okapi's behavior that makes it not the greatest starting point as an example.  "Acme Phone" is a subflow, and Yves's snippet identifies it as such, but units extracted by Okapi don't actually take advantage of that fact.  (Instead, the filter splinters the source text, which I think makes things more confusing.)

 

A more natural XLIFF representation would look something like this:

 

  <unit id="tu1">

   <originalData>

    <data id="d1">[#$dp1]</data>

    <data id="d2">&lt;h1></data>

    <data id="d3">&lt;/h1></data>

   </originalData>

   <segment>

    <source>Alert: <pc id="1" dataRefStart="d2" dataRefEnd="d3"><ph id="2" canCopy="no" canDelete="no" subFlows="tu3" dataRef="d1"/> now available in stores.</pc></source>

   </segment>

  </unit>

  <unit id="tu2">

   <segment>

    <source>Get it today!</source>

   </segment>

  </unit>

  <unit id="tu3" type="okp:alt">

   <segment>

    <source>Acme Phone</source>

   </segment>

  </unit>

 

The main difference for the purposes of discussion is that the <h1> tag can now be represented as a well-formed code.  However, I think Yves's point stands regarding how the sc/ec case would be handled in the model Ryan is proposing.

 

One comment regarding something Ryan said:

> the example below that shows text + internal tags + text illustrates how the “text” portions are on top-level but shouldn’t have duplicate keys.

 

I'm not sure I'm understanding this correctly, but in the example that follows, the top-level text are items in a JSON array, rather than an object, and arrays have no uniqueness restriction on items.  (eg, [ "a", "a", "a" ] is valid)

 

ct

 

 

On Wed, Feb 10, 2016 at 7:09 PM, Yves Savourel <ysavourel@enlaso.com> wrote:

Hi all,

The thing that bothers me a bit with such nested content is that an innerContent does not corresponds to anything at the XLIFF level: It’s just an arbitrary choice of the object model to represent content between paired codes as an object.

Also, what would be the representation with overlapping inline codes?

For example:

<unit id="u1">
<originalData>
<data id="d1">[C1]</data>
<data id="d2">[C2]</data>
<data id="d3">[/C1]</data>
<data id="d4">[/C2]</data>
</originalData>
<segment>
<source><sc id="c1" dataRef="d1"/>text1 <sc id="c2" dataRef="d2"/>text2 <ec startRef="c1" dataRef="d3"/>text3 <ec startRef="c2" dataRef="d4"/></source>
</segment>
</unit>

In a “linear” representation the source content would give something like this:

[{
        "kind": 0,
        "id": "c1",
        "data": "[C1]"
}, "text1 ", {
        "kind": 0,
        "id": "c2",
        "data": "[C2]"
}, "text2 ", {
        "kind": 1,
        "id": "c1",
        "data": "[\/C1]"
}, "text3 ", {
        "kind": 1,
        "id": "c2",
        "data": "[\/C2]"
}]

What would be the “nested” representation?


Cheers,
-yves




From: xliff-omos@lists.oasis-open.org [mailto:xliff-omos@lists.oasis-open.org] On Behalf Of Ryan King
Sent: Tuesday, February 9, 2016 10:45 AM
To: xliff-omos@lists.oasis-open.org
Subject: FW: [xliff-omos] A few notes in JSON and inline content


Hi all,

Regarding our discussion on the call this morning, the example below that shows text + internal tags + text illustrates how the “text” portions are on top-level but shouldn’t have duplicate keys.
“Alert” and “ Get it today” for example. However, codeEnd can be duplicated since they are embedded.

["Alert: ", {
                       "id": "1",
                       "codeStart": "<h1>",
                       "innerContent": {
                                   "id": "2",
                                   "codeStart": "[productname=\"",
                                   "content": "Acme Phone",
                                   "codeEnd": "\"]"
                       },
                       "content": " now available in stores",
                       "codeEnd": "</h1>"
           },
           " Get it today"
]

Thanks,
Ryan

From: xliff-omos@lists.oasis-open.org [mailto:xliff-omos@lists.oasis-open.org] On Behalf Of Ryan King
Sent: Tuesday, January 26, 2016 12:18 AM
To: Phil Ritchie <Phil.Ritchie@vistatec.com>; Yves Savourel <ysavourel@enlaso.com>; xliff-omos@lists.oasis-open.org
Subject: RE: [xliff-omos] A few notes in JSON and inline content

Thanks Yves and Phil for steering us in the right direction and correcting my suggestion in favor of not just standards but also best practices!

Let’s go one step further. What if we have text preceding and following innerContent? We might get something like this:

<originalData>
<data id="d1">&lt;h1></data>
<data id="d2">&lt;/h1></data>
<data id="d3">&lt;br/></data>
<data id="d4">[productname name="</data>
<data id="d5">"]</data>
</originalData>
<!--
Alert: <h1>[productname name="Acme Phone"] now available in stores.</h1> Get it today!
-->
<source>Alert: <pc id="1" dataRefStart="d1" dataRefEnd="d2"><pc id="2" dataRefStart="d4" dataRefEnd="d5">Acme Phone</pc> now available in stores.</pc> Get it today!</source>

["Alert: ", {
                        "id": "1",
                        "codeStart": "<h1>",
                        "codeEnd": "</h1>",
                        "innerContent": {
                                    "id": "2",
                                    "codeStart": "[productname=\"",
                                    "codeEnd": "\"]",
                                    "innerContent": "Acme Phone"
                        }
            },
            " now available in stores."
            " Get it today!“
]

How would I know, though, where to place </h1>? So how about something even more explicit:

["Alert: ", {
                       "id": "1",
                       "codeStart": "<h1>",
                       "innerContent": {
                                   "id": "2",
                                   "codeStart": "[productname=\"",
                                   "content": "Acme Phone",
                                   "codeEnd": "\"]"
                       },
                       "content": " now available in stores",
                       "codeEnd": "</h1>"
           },
           " Get it today"
]

Thanks,
Ryan

From: xliff-omos@lists.oasis-open.org [mailto:xliff-omos@lists.oasis-open.org] On Behalf Of Phil Ritchie
Sent: Friday, January 15, 2016 3:48 PM
To: Yves Savourel <ysavourel@enlaso.com>; xliff-omos@lists.oasis-open.org
Subject: RE: [xliff-omos] A few notes in JSON and inline content


All

I came across the same issue as Yves with Ryan's notation, namely duplicate keys. The Newtonsoft Json library used by most C# developers strips off the first plaintext field.

From playing with both, nested inline tags are also difficult to handle in Ryan's notation.

I did find Ryan's more intuitive to read though.

We have quite a lot of content that contains custom placeholders with translateable attributes. In XLIFF, right or wrong, we encode as:

<originalData>
<data id="d1">&lt;b></data>
<data id="d2">&lt;/b></data>
<data id="d3">&lt;br/></data>
<data id="d4">[productname name="</data>
<data id="d5">"]</data>
</originalData>
<!--
<h1>[productname name="Acme Phone"] now available in stores</h1>
-->
<source>
<pc id="1" dataRefStart="d1" dataRefEnd="d2">
<pc id="2" dataRefStart="d4" dataRefEnd="d5">Acme Phone</pc> now available in stores
</pc>
</source>

Making a hybrid of Yves and Ryan's notation I get:

[
{
"id": "1",
"codeStart": "<h1>",
"codeEnd": "</h1>",
"innerContent": {
"id": "2",
"codeStart": "[productname=\"",
"codeEnd": "\"]",
"innerContent": "Acme Phone"
}
},
" now available in stores"
]

Phil

> -----Original Message-----
> From: xliff-omos@lists.oasis-open.org [mailto:xliff-omos@lists.oasis-
> open.org] On Behalf Of Yves Savourel
> Sent: 15 January 2016 22:03
> To: xliff-omos@lists.oasis-open.org
> Subject: RE: [xliff-omos] A few notes in JSON and inline content
>
> Hi Ryan, all,
>
> I don't think it would be a good idea to use a notation where the names of
> the objects correspond to their type, like in your example.
>
> We would have duplicates (like "plainText" twice in your example) and it is
> likely to cause trouble.
>
> The JSON specification does not say anything explicit about uniqueness of
> the names. But RFC 7159 (The JSON Data Interchange Format:
> http://tools.ietf.org/html/rfc7159#section-4) says "The names within an
> object SHOULD be unique". And usually one is better off treating a SHOULD
> like a MUST unless there are very good reason to do otherwise.
>
> The main example of issue with duplicated names is that many
> implementations of JSON reader use some kind of Map, Hash, or Dictionary
> classes that do not support duplication of keys.
>
> See also the discussion here:
> http://stackoverflow.com/questions/21832701/does-json-syntax-allow-
> duplicate-keys-in-an-object. The consensus seems to be that, while strictly
> speaking JSON does not forbid uniqueness of the names, it is a really good
> idea to keep these names unique for all kinds of very valid reasons.
>
> Cheers,
> -yves
>
>
> -----Original Message-----
> From: Ryan King [mailto:ryanki@microsoft.com]
> Sent: Friday, January 15, 2016 1:39 PM
> To: Ryan King <ryanki@microsoft.com>; Yves Savourel
> <ysavourel@enlaso.com>; xliff-omos@lists.oasis-open.org
> Subject: RE: [xliff-omos] A few notes in JSON and inline content
>
> Sorry, I should also ask the question of why original codes need a type, or
> kind, as you indicate below. Couldn't they just be distinct objects in the
> model? Maybe there is a nuance I am missing, though.
>
> For example:
> standaloneCode
> spanningCode
> spanningCodeStart
> spanningCodeEnd
> etc.
>
> Thanks,
> Ryan
>
> -----Original Message-----
> From: xliff-omos@lists.oasis-open.org [mailto:xliff-omos@lists.oasis-
> open.org] On Behalf Of Ryan King
> Sent: Friday, January 15, 2016 10:12 AM
> To: Yves Savourel <ysavourel@enlaso.com>; xliff-omos@lists.oasis-open.org
> Subject: RE: [xliff-omos] A few notes in JSON and inline content
>
> Hi Yves,
>
> Thanks for getting the ball rolling. I absolutely agree with you that we should
> just start defining the JSON representation.
> Starting at inline and working out is good, as well. Once we agree on inline
> representation, the rest is easier. So I'm all for this approach. I do see the
> representation of your sample a bit differently, however. In the MS OM, the
> sample would be represented as an array of four objects: PlainText,
> SpanningCode, PlainText, StandaloneCode - just using those object names as
> an examples, you could serialize it to something like this:
>
> {
>
> "plainText": "Text in",
>
> "spanningCode": {
>
> "id": "1",
>
> "codeStart": "<b>",
>
> "codeEnd": "</b>",
>
> "innerText": "bold"
>
> },
>
> "plainText": "format.",
>
> "standaloneCode": {
>
> "id": "2",
>
> "code": "<br>"
>
> }
>
> }
>
> Thanks,
> Ryan
>
> -----Original Message-----
> From: xliff-omos@lists.oasis-open.org [mailto:xliff-omos@lists.oasis-
> open.org] On Behalf Of Yves Savourel
> Sent: Thursday, January 14, 2016 7:13 PM
> To: xliff-omos@lists.oasis-open.org
> Subject: [xliff-omos] A few notes in JSON and inline content
>
> Hi all,
>
> We have to start somewhere, so maybe a good place is a simple inline
> content. It's one of the most tricky parts to serialize in a common way
> because the internal representation of such content is likely to be different
> in the various implementations depending on how the overall document is
> stored (e.g. DOM, DB, memory, etc.) and also on what the implementation
> goals are (e.g. do matching, be the back-end of an editor, etc.).
>
> A possible effective representation would be the simplest. It may not fit
> exactly the underlying object of all implementations, but it should be
> relatively easy to generate and parse by all.
>
> Such content is simply an array of objects. So for example, if we have this
> object (here in XLIFF so everyone can relate to it):
>
> <originalData>
> <data id='d1'>&lf;b></data>
> <data id='d2'>&lt;/b></data>
> <data id='d3'>&lt;br></data>
> </originalData>
> ...
> <source>Text in <pc id='1' dataRefStart='d1' dataRefEnd='d2'>bold</pc>
> format.<ph id='2' dataRef='d3'/></source>
>
> The JSON representation could be something like this:
>
> [ "Text in ",
> {
> "kind":0,
> "id":"1",
> "data":"<b>"
> },
> "bold",
> {
> "kind":1,
> "id":"1",
> "data":"<\/b>"
> },
> " format.",
> {
> "kind":2,
> "id":"2",
> "data":"<br>"
> }
> ]
>
> The array has 6 objects: 3 strings, which correspond to the spans of plain text,
> and 3 objects corresponding to the inline tags. The objects would have a
> relatively identical structure. The "kind" field (trying to keep "type" for the
> XLIFF-type) would indicate if the object is an opening code (0), and closing
> code (1), a standalone code (3), an opening marker (4) or a closing marker
> (5).
>
> We would have also some rules:
>
> - The fields that have values equals to the default value MAY be omitted in
> the JSON string.
> - The fields within the objects would have no prescribed order.
> - The fields common to both the opening and closing codes (e.g. id, type,
> etc.) would be represented once only: in the opening code.
> If there is no opening code (i.e. there is an isolated closing code) the fields
> would be represented in the closing code.
>
> This is just one possible representation.
> I'm sure others have better ideas and suggestions.
>
> Cheers,
> -yves
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe from this mail list, you must leave the OASIS TC that
> generates this mail. Follow this link to all your TCs in OASIS
> at:
> https://www.oasis-
> open.org/apps/org/workgroup/portal/my_workgroups.php
>
>
> ---------------------------------------------------------------------
> To unsubscribe from this mail list, you must leave the OASIS TC that
> generates this mail. Follow this link to all your TCs in OASIS
> at:
> https://www.oasis-
> open.org/apps/org/workgroup/portal/my_workgroups.php
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe from this mail list, you must leave the OASIS TC that
> generates this mail. Follow this link to all your TCs in OASIS at:
> https://www.oasis-
> open.org/apps/org/workgroup/portal/my_workgroups.php
Phil Ritchie | Chief Technology Officer | VistaTEC
VistaTEC House, 700 South Circular Road, Kilmainham, Dublin 8, Ireland.
Tel: +353 1 416 8000
Email: Phil.Ritchie@vistatec.com | www.vistatec.com | ISO 9001 | EN 15038

Expert Leadership in Global Content Solutions


---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that
generates this mail.  Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php

 



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]