See: Description
Package | Description |
---|---|
org.oasisopen.xliff.om.v1 |
XLIFF Object Model Interfaces
|
Prototype of object model and API for XLIFF.
At this time, this documentation COVERS ONLY THE INLINE CONTENT
of the document (what occurs within the <source>
and <target>
elements).
It is only a DRAFT PROTOTYPE work item and HAS NO OFFICIAL
STANDING of any kind.
One of the main challenges in representing a parsed XLIFF content is to
deal with the inline elements.
The following is the description of one possible object model.
toUppercase()
)
to the content without undue complexity or side-effect.<pc>
or <sc/>
)</pc>
or <ec/>
)<ph/>
)<mrk>
or <sm/>
)</mrk>
or <em/>
)The two characters together constitute a tag reference. Their values are combined into a single integer key, unique within a unit:
int key = ((char1 << 16) | char2);
char char1 = (char)(key >> 16);
char char2 = (char)key;
That key can be used as a unique hash value to access the object where the information about the given tag is held.
This representation has the following characteristics:
toLowercase()
,
toUppercase()
, etc. (Req-1).<pc>
or <sc/>
, 6128 </pc>
or <ec/>
, 6128 <ph/>
, 6128 <mrk>
or <sm/>
, and 6128 </mrk>
or <em/>
.
(Req-7)For example, the following HTML content:
T | e | x | t | |
< | b | > | b | o | l | d | < | / | b | > | . |
Is represented as:
A) The following coded text:
T | e | x | t | |
U+E101 | U+E110 | b | o | l | d | U+E102 | U+E110 | . |
Where:
And B) the following store of Tag objects hashed on the tag reference keys:
-519970544 | Tag object for "<b>" |
-519905008 | Tag object for "</b>" |
The inline content is represented by different objects. Each object is either a basic type (String, int, boolean), or an object represented by an interface:
In JSON, the IContent object is represented as an array of alternatively strings and ITag objects. If the content has no tag it is represented as an array of a single string. This serialization allows the object model to possibly change to some degree how the tags are associated with their position in the text. This also means the keys values of a content written out may or may not be the same when it is read back. The field names should be short (this is meant to be read only by machines and developers, not end-users).
Example of serialization (characters are escaped for readability):
XLIFF serialization:
<originalData>
<data id='d1'>&lf;b></data>
<data id='d2'></b></data>
<data id='d3'><br></data>
</originalData>
...
<source>Text in <pc id='1' dataRefStart='d1' dataRefEnd='d2'>bold</pc> format.<ph id='2' dataRef='d3'/></source>
JSON serialization:
[ "Text in ", { "kind":"\uE101", "id":"1", "data":"<b>" }, "bold", { "kind":"\uE102", "id":"1", "data":"<\/b>" }, " format.", { "kind":"\uE103", "id":"2", "data":"<br>" } ]
TODO: more examples