[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [xliff-comment] XLIFF vs. PO vs. Trolltech
Hello Asgeir, *, thanks for the replies. On Saturday 17 May 2008 06:39:32 Asgeir Frimannsson wrote: > On Saturday 17 May 2008 03:05:11 am Oswald Buddenhagen wrote: > > Trolltech is looking into implementing/improving XLIFF support in Qt's > > Linguist tool chain. Interoperability with PO files is an item, too. > > This is what I've come up with. Please sanity-check it, so we don't set > > a faulty de-facto standard in case we go for it. ;) > > In terms of PO interoperability and the representation of TS in PO, it > would probably be wise to discuss this on the GNU gettext mailinglist > (bug-gnu-gettext@gnu.org) see http://savannah.gnu.org/projects/gettext/ . > OK, on CC now. > Also, the Translate Toolkit (translate.sf.net) have some existing ts<->po > converters but I'm not sure what the status of these are. > Somewhat rudimentary, it seems after a quick test. > > - The PO representation guide says that everything should be put into one > > <file> element and PO references should be represented as <context > > context-type="sourcefile">. This is in accordance with the XLIFF spec > > (see "sourcefile" value doc). However, that means that if I create an > > .xlf file directly from sources I get a different representation than if > > I create a .po file and convert it to .xlf later. I find this > > inconsistency not justified, so I think I would opt for the "native" > > representation with multiple <file> elements. Only if the PO message has > > additional references to other files, sourcefile contexts would be used. > > The main issue with representing this as multiple <file> elements is that > in XLIFF, there is no concept of meta-data above the <file> level. > Right. I just mapped the .po file header to a message with an empty source coming from a file with no name, i.e., basically doing what .po does. This is sort of hacky, but OTOH it requires no special support from tools, so I expect less trouble from this approach than some more or less arbitrary other mapping. > We used > a single <file> element for representing a PO, as a PO is a single file. > If > e.g. gettext implemented support natively for XLIFF, the data model would > be very different, as the source would be a set of source-files with > extracted translatable text, rather than a single resource file. > This is basically what I proposed, right? > (this might be a bit Qt/Trolltech specific from here:) > > From what I understand from your mail you are trying to accomplish > something like > > # generates a single .xlf for the project with mutiple <file> elements > lupdate -xlf myproject.pro > > # generates a single .po for the project > lupdate -po myproject.pro > > # generates a single .ts for the project > lupdate -ts myproject.pro > > So you are saying that if you take the PO generated above and create an > XLIFF from it using the representation guide, it will be different from the > XLIFF created by lupdate directly? > Yes. > If so, I don't see anything wrong with > that, as they are technically representing two rather different > data-models. > Yes ... however, one of our aims is having lossless conversion between the formats (*) for smooth integration into existing systems (and to simplify internal testing :). This should happen as naturally as possible, without introducing magic meta data unless unavoidable. (*) OK, so converting from XLIFF to something else and back to XLIFF is not going to work losslessly, but you get the idea. :) > As a side-note: In some of my work, I've found it more beneficial to > represent PO files as a hierarchy of <group> elements based on the PO > references rather than the flat structure we have defined in the PO > representation guide. This structure gives a much better contextual > hierarchy for both translators and processing tools. This approach takes > more processing though, as you have inter-trans-unit references, and the PO > would have to be fully read before starting to write the XLIFF file. > Howerver, you might find this > representation closer to what you're trying to accomplish, > Yes. > although I'm not sure how it matches with the ts <context> element. > That's fine - .ts contexts are basically nested into files (well, actually, it is not unlikely to have the same context both in a .ui file and in the associated .cpp file, but that's not really a tragedy). > PO: > #:src/MyDialog.cpp:23 src/MyOtherDialog.cpp:12 > msgid "Hello World" > msgstr "" > > XLIFF representation: > <group restype='x-directory' resname='src'> > <group restype='x-file' resname='MyDialog.cpp'> > <trans-unit id='1'> > <source>Hello World</source> > </trans-unit> > </group> > <group restype='x-file' resname='MyOtherDialog.cpp'> > <trans-unit id='2' translate='no'> > <source><ph id='x' xid='1'/></source> > </trans-unit> > </group> > </group> > Hmm, this approach didn't occur to me, as it basically contradicts the expected usage of <file> elements, no? Something to change for XLIFF 2.0? > > - Gettext's new msgctxt keyword was brought up before. Incidentally, the > > <comment> element in Qt's own .ts files maps pretty well to it. There > > is no standardized mapping for .xlf yet, though. I would pick up a > > previously suggested approach and do it like that: > > > > <trans-unit> > > <source>foobar</source> > > <target>irgendwas</target> > > <context-group purpose="match information"> > > <context context-type="x-gettext-msgctxt" > > match-mandatory="yes">some context info</context> > > </context-group> > > </trans-unit> > > > > For plural forms, the context would be attached to the plural group. > > The exact value for purpose= is not clear to me - the values suggested > > seem to refer to TM only. I think I would simply skip the purpose ... > > Translator editors can e.g. display the context to the translator only > if 'purpose' is set to 'information', and hide it otherwise. > Oh, right - I misread the spec. So "information" is definitely correct. > Similarly, a > TM processor can chose to perform additional 'context matching' based on > the the 'match' purpose-value. This would e.g. be useful if you had two > identical translation units, but with different contexts, and the TM > processor could automatically match better based on these. > Yes, except that I need it to apply not only to the TM processor, but also to the tool that generates the output for the translator library in the program. I suppose it won't hurt if I slightly stretch the definition for the linugist tools, but it seems to me that something formally approved would be cleaner. > > - .ts files know a <context> element. I consider it stronger than > > msgctxt: it is not optional; every message is in a context. Therefore I > > would map it to nested groups: > > > > <group restype="x-trolltech-ts-context"> > > <context-group purpose="match information"> > > <context context-type="x-trolltech-ts-context" > > match-mandatory="yes">the > > context</context> > > </context-group> > > <trans-unit .../> > > </group> > > > > FWIW, the mapping to PO would be via a magic extracted comment: > > #. ts:context <the context> > > This sounds sensible to me. > Good. > > - As the repr. guide says, .po files do not encode the (target) language. > > Therefore I would add an X-Language: header to the initial msgstr. It > > would be implanted and extracted during conversion. When converting from > > an .xlf file which does not have a first message that seems to be a .po > > file header, a message would be generated and marked with > > X-Virgin-Header:; if this header is found on converting back, the message > > would be zapped. > > Not sure I understand the use-case for this. > That's again for the lossless conversion. Simply because .ts needs the target language for the same purpose that .po uses the "Plural-Forms:" header - unfortunately, no unambiguous reverse mapping is possible. > > - Gettext's #| msgid (previous source in fuzzy translation) would be > > mapped to <alt-trans> elements as suggested on this list before: Each > > previous source is tacked onto a current source. If more previous sources > > than current sources exist (plural to singular "downgrade"), the source > > gets two alt-trans elements, the second one with an empty target marked > > with restype="x-dummy". > > - Gettext's #| msgctxt would get mapped just like msgctxt, only that the > > context-type would be x-gettext-previous-msgctxt. > > - Contrary to the guide, I would store obsolete messages, marking the > > <trans-unit> resp. the containing plural <group> with translate="no". > > I see no harm in doing this and it yields a more faithful conversion. > > The messages would go into a <file> with the imaginary original name > > Obsolete_PO_entries. > > I'm not sure if we really need to go to this extent. I guess it's more a > design-question if XLIFF was really meant to be a replacement for all > features that a format supports, rather than an extraction-format. E.g. > obsolete entries in PO is a way of storing translation that was used in > previous versions of the project, but are no longer used (however they may > pop up in later versions of the project, that's why they are stored). XLIFF > was not intended to be a storage container for these (I guess TMs replace > this functionality), and I'm not sure if trying to mold XLIFF into such a > storage container would break processing tools etc (wrong statistics, word > counts, file counts etc). > Good point. But we need it for the lossless roundtrips again. :) Luckily, lupdate has an option -noobsolete already - I guess adding that to the anticipated lconvert would not be exceedingly hard. :-) > > - The guide does not specify how to map fuzzy plurals. I guess one should > > require approval of all <trans-unit>s in the <group> for non-fuzziness. > > Yes, this is a design-limitation of the current XLIFF specification. This > approach sounds reasonable to me. > OK Regards, -- Oswald Buddenhagen Trolltech GmbH Rudower Chaussee 13 12489 Berlin Germany Fon: +49 (030) 6392 3255 Fax: +49 (030) 6392 3256
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]