OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

xliff message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [xliff] Simplified XLIFF element tree


Hi,

Of course many times a <para> in DockBook or <p> in HTML may contain many sentences and you would want many segments. You can segment the <para> or <p> at text extraction time and put each segment in its own <trans-unit>.

Adding spanning tags in <source> to indicate segmentation is a very bad idea. Each segment must have its own <source>/<target> pair that can be exported later to create a translation memory or to feed machine translation systems. If you use a spanning mechanism inside source, you will have multiple segments in source and target and the number of source fragments may not match the number of target fragments; that's very bad for TM/MT support and not XSLT friendly at all.

Regards,
Rodolfo
--
Rodolfo M. Raya   <rmraya@maxprograms.com>
Maxprograms      http://www.maxprograms.com


> -----Original Message-----
> From: Asgeir Frimannsson [mailto:asgeirf@redhat.com]
> Sent: Monday, August 23, 2010 8:21 AM
> To: xliff
> Subject: Re: [xliff] Simplified XLIFF element tree
> 
> Hi all,
> 
> My initial thoughts on the subject of segmentation:
> 
> 1) There are going to be times when the "extraction unit" (e.g. a docbook
> <para> or a html <p>) is not always at the segment level.
> 2) The segmentation process should typically belong to the 'translation
> domain', not the 'extraction domain', although implementors may choose to
> add segmentation in the extraction process. This is particularly important
> where there is a n-to-m relationship between segments in source and
> translation.
> 3) <seg-source> is not an ideal solution. Annotating segments using <mrk> in
> <source>, or introducing some other span-annotation mechanism is my
> preferred solution.
> 
> Perhaps what is lacking in the standard is a clear way to maintain "extraction
> units" vs "translation units". In pre 2.0, the <trans-unit> is typically referred
> to as the unit of extraction, but perhaps there is a case for a finer-grained
> trans-unit that is a sub-set of an extraction unit, which can maintain it's own
> state, annotations, TM/MT suggestions, etc?
> 
> 
> cheers,
> asgeir
> 
> ----- "Andrzej Zydron" <azydron@xtm-intl.com> wrote:
> 
> > Hi Everyone,
> >
> > I completely agree. We always pre-segment so out <trans-unit> elements
> >
> > each hold a segment. We use the <group> element to signify the higher
> >
> > level at which segmentation has taken place, e.g.:
> >
> > <group id="4">
> > <trans-unit id="t5" resname="p" translate="yes" xml:space="default">
> > <source>When a user logs out of the <g
> >                              id="i2">XTM</g> Client, the Client clears
> >
> > the <g
> >                              id="i3">UserName</g> and <g
> > id="i4">Password</g> property of the application.</source>
> > <target>When a user logs out of the <g
> >                              id="i2">XTM</g> Client, the Client clears
> >
> > the <g
> >                              id="i3">UserName</g> and <g
> > id="i4">Password</g> property of the application.</target>
> > </trans-unit>
> > <trans-unit id="t6" resname="p" translate="yes" xml:space="default">
> > <source>The application will respond to the associated <g
> >                              id="i5">PasswordChange</g> event by
> > checking the values of the <g
> >                              id="i6">UserName</g> and <g
> > id="i7">Password</g>.</source>
> > <target>The application will respond to the associated <g
> >                              id="i5">PasswordChange</g> event by
> > checking the values of the <g
> >                              id="i6">UserName</g> and <g
> > id="i7">Password</g>.</target>
> > </trans-unit>
> > <trans-unit id="t7" resname="p" translate="yes" xml:space="default">
> > <source>If these are empty, the application will log out.</source>
> > <target>If these are empty, the application will log out.</target>
> > </trans-unit>
> > </group>
> >
> > Best Regards,
> >
> > AZ
> >
> > On 23/08/2010 10:49, Rodolfo M. Raya wrote:
> > > Hi,
> > >
> > > I like the idea of further simplification. The elements that Yves
> > removed can be left away.
> > >
> > > Segmentation information was optional in XLIFF 1.2 and will continue
> > to be optional. The<part>  element added by Yves should not be part of
> > the basic tree. And, as I see it, any segmentation info will never be
> > inside<source>  or<target>, it will be at a higher level, preferably
> > outside<trans-unit>.
> > >
> > > Regards,
> > > Rodolfo
> > > --
> > > Rodolfo M. Raya<rmraya@maxprograms.com>
> > > Maxprograms      http://www.maxprograms.com
> > >
> > >> -----Original Message-----
> > >> From: Yves Savourel [mailto:ysavourel@translate.com]
> > >> Sent: Monday, August 23, 2010 12:15 AM
> > >> To: xliff@lists.oasis-open.org
> > >> Subject: RE: [xliff] Simplified XLIFF element tree
> > >>
> > >> Hi,
> > >>
> > >> For the core/minimal XLIFF I would go for an even simpler model
> > than
> > >> Rodolfo's:
> > >>
> > >> -- I think segmentation is too important to not be part of the core
> > structure
> > >> of XLIFF, and the representation as extra info like it is the case
> > in 1.2 (because
> > >> of the need to be backward compatible with 1.1) is not adequate. It
> > does not
> > >> mean a file must always be segmented, just that representing a
> > segmented
> > >> content must be simple and the processing of segmented vs
> > un-segmented
> > >> content will be seamless. For simplicity I've represented this
> > as<part>  in
> > >> the tree below, but it would be whatever structure we would end up
> > with.
> > >>
> > >> -- I wouldn't put all the alt-trans data in the core. It
> > corresponds to specific
> > >> features that are not core to represent extracted text.
> > >>
> > >> -- I would select only essential parts for the core, and therefore
> > not include
> > >> the skeleton since it's not something essential for the extracted
> > data (i.e.
> > >> one can merge without using the XLIFF<skel>  data, and ,skel>  is
> > really tool-
> > >> specific anyway).
> > >>
> > >> -- I would not reduce the XLIFF namespace to the core. Just declare
> > as core a
> > >> subset of element/attributes of the namespace.
> > >>
> > >>
> > >> <xliff version1>1
> > >> |
> > >> +---<file original1 source-language1 datatype1>+
> > >>       |
> > >>       +---<body>1
> > >>            |
> > >>            +---<group id1 resname? restype?>*
> > >>            |    |
> > >>            |    +--- [trans-unit]*
> > >>            |
> > >>            +---<trans-unit id1 resname? restype?>*
> > >>                 |
> > >>                 +---<source>1
> > >>                 |    |
> > >>                 |    +---<part id?>+
> > >>                 |         |
> > >>                 |         +--- [inline markup]*
> > >>                 |
> > >>                 +---<target>?
> > >>                      |
> > >>                      +---<part id?>+
> > >>                           |
> > >>                           +--- [inline markup]*
> > >>
> > >>
> > >> Cheers,
> > >> -ys
> > >>
> > >>
> > >>
> > >>
> > ---------------------------------------------------------------------
> > >> To unsubscribe from this mail list, you must leave the OASIS TC
> > that
> > >> generates this mail.  Follow this link to all your TCs in OASIS
> > at:
> > >> https://www.oasis-
> > >> open.org/apps/org/workgroup/portal/my_workgroups.php
> > >
> > >
> > >
> > ---------------------------------------------------------------------
> > > To unsubscribe from this mail list, you must leave the OASIS TC
> > that
> > > generates this mail.  Follow this link to all your TCs in OASIS at:
> > >
> > https://www.oasis-
> open.org/apps/org/workgroup/portal/my_workgroups.php
> > >
> >
> > --
> > email - azydron@xtm-intl.com
> > smail - c/o Mr. A.Zydron
> > 	PO Box 2167
> >          Gerrards Cross
> >          Bucks SL9 8XF
> > 	United Kingdom
> > Mobile +(44) 7966 477 181
> > FAX    +(44) 1753 480 465
> > www - http://www.xtm-intl.com
> >
> > This message contains confidential information and is intended only
> > for
> > the individual named.  If you are not the named addressee you may not
> > disseminate, distribute or copy this e-mail.  Please notify the
> > sender
> > immediately by e-mail if you have received this e-mail by mistake and
> > delete this e-mail from your system.
> > E-mail transmission cannot be guaranteed to be secure or error-free
> > as
> > information could be intercepted, corrupted, lost, destroyed, arrive
> > late or incomplete, or contain viruses.  The sender therefore does
> > not
> > accept liability for any errors or omissions in the contents of this
> > message which arise as a result of e-mail transmission.  If
> > verification
> > is required please request a hard-copy version. Unless explicitly
> > stated
> > otherwise this message is provided for informational purposes only
> > and
> > will not be construed as a solicitation or offer.
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe from this mail list, you must leave the OASIS TC that
> > generates this mail.  Follow this link to all your TCs in OASIS at:
> > https://www.oasis-
> open.org/apps/org/workgroup/portal/my_workgroups.php
> 
> ---------------------------------------------------------------------
> To unsubscribe from this mail list, you must leave the OASIS TC that
> generates this mail.  Follow this link to all your TCs in OASIS at:
> https://www.oasis-
> open.org/apps/org/workgroup/portal/my_workgroups.php




[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]