OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

docbook-apps message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Re: [docbook-apps] Apostrophe in docbook document

Hi Ron

On 26/01/10 20:42, Ron Catterall wrote:

> Hi Dave
> Not sure why I got into this, but I'll push it along a bit.
> XML was designed to allow the storage of formatted text in a human and
> machine readable state.
> When a human does the reading (of the XML text) he can see the &apos or
> &rsquo character in context and guess pretty accurately whether it is an
> indication of a missing character, a genitive marker or a closing quote.
> So far I am with you all the way - it doesn't matter in English.

And when the formatted output is presented to the human
which Unicode code point is used is rarely material.

> Now look at machine reading:
> Imagine a linguist wanting to search some text to count
> 1. The use of contractions (e.g. isn't versus is not ). He wants to find
> list and count all contractions. His text editor or little Perl script
> (he doesn't know regex) looks for &rsquo and finds what he wants
> corrupted by lots of extraneous closing strings and genitive markers.
> The three logically different functions are represented by the same code.
> 2. ditto except that this time he wants to find quoted strings
> 3. ditto but this time his interest is in the grammar and he is
> searching for genitives
> 4. why he might want to distinguish between singular and plural
> genitives is beyond me. But he might.

My initial reaction is who the heck is going to mark this up - 
accurately and with the knowledge of English and Unicode to do
a good job of it. Someone in Edinburgh perhaps? http://www.ling.ed.ac.uk/

> I guess I just don't like one symbol with three meanings. Imagine this
> in your code, you don't need = == and EQ, one symbol will handle all.

Yep. I'm doing that to please the compiler writer I guess.

> The problem of course is not a Docbook problem, it is in the UTF tables
> (and the linguist would probably be using TEI anyway, but it's not a TEI
> problem either)

Your proposal is a solution, not a problem Ron :-)

> In my case all my quotes in XML tags are done on the keyboard #x27, all
> my text quotes are <quote>, all my apostrophe marks and genitives are
> &apos so a simple global edit puts all to rights for me - now that I
> know to use &rsquo

Suggestion. If you're using Linux. Look into keyboard mappings
and use... perhaps your numeric keypad to generate this 'suite' for you
using a single keypress? Just a thought.



Dave Pawson

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]