[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [docbook-apps] Capturing phrase books and dictionaries
Lech Rzedzicki wrote: > We're trying to keep our markup close to DB5 but we also want to > tighten the schema a bit further. > One area we're particularly struggling with is phrase books and > dictionaries. This was originally modelled using TEI and reflects the > actual structure quite well. > The problem we have is that both in the original language portion > (form) and in the the target language explanation (sense) we need to > allow many optional elements such as example, pronunciation, often > multiple times (as there can be many forms or senses or many examples > for each sense or form), gradually this led us to a very complex and > loose model which also doesn't maintain the relationship between the > original and translation too well. > > I was wondering if any of you have any experience dealing with similar > content and whether you could share your experience and schemas? We are working a lot with XML-based bilingual dictionaries (not phrase books, although they may be similar). I think the bottom line is, don't use DocBook for dictionaries (at least not for the body of the dictionary, i.e. all the entries). It just isn't the same kind of structure. TEI-encoded dictionaries tend to reflect the structure of the print dictionary from which the electronic form was derived. That has a couple advantages: 1) It's easy(-er) to convert from the print form to the electronic form, and go back later and make sure you did it right 2) It makes producing a new print copy of the dictionary that looks like the original print dictionary easy(-er). It also has some disadvantages: 1) Unless you're working with a bunch of similar dictionaries from a single publisher, you're likely to wind up with a large number of schemas (or DTDs), one for each dictionary, and that can be hard to manage. 2) The large number of schemas in (1) also means that you probably have to write a different CSS (or whatever you use) for each one. 3) You're limited to a single presentation form, i.e. it is difficult to display a root-based dictionary as a stem-based dictionary. What we (and probably most people who work with multiple electronic dictionaries) do instead, is to use a generic lexicon schema. This flattens the overall structure of a typical print dictionary (e.g. subentries become entries on their own); the original structure is instead represented by xrefs (so a sub-entry and a minor entry both have pointers back to the main entry). One can then postpone until run-time decisions like root-based vs. stem-based presentation, or whether a given minor entry is displayed as a sub-entry or as an entry on its own (and perhaps alphabetized on its own, if that's relevant to the electronic display). The run-time decisions are then implemented using one of two (or several) style sheets. More than that about this approach (as opposed to doing something with dictionaries inside DocBook) probably doesn't belong on this list. Fortunately there are lexicography mailing lists, e.g. the Lexicography list (see http://linguistlist.org/lists/get-lists.cfm). -- Mike Maxwell What good is a universe without somebody around to look at it? --Robert Dicke, Princeton physicist
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]