OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

docbook-apps message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Re: [docbook-apps] Apostrophe in docbook document

Thanks for pointing me to the Unicode Standard 5.2 - I'll covert &apos 
to &rsquo

Hi Dave

Not sure why I got into this, but I'll push it along a bit.

XML was designed to allow the storage of formatted text in a human and 
machine readable state.

When a human does the reading (of the XML text) he can see the &apos or 
&rsquo character in context and guess pretty accurately whether it is an 
indication of a missing character, a genitive marker or a closing quote. 
  So far I am with you all the way - it doesn't matter in English.

Now look at machine reading:
Imagine a linguist wanting to search some text to count
1. The use of contractions (e.g.  isn't   versus   is not ).  He wants 
to find list and count all contractions.  His text editor or little Perl 
script (he doesn't know regex) looks for &rsquo and finds what he wants 
corrupted by lots of extraneous closing strings and genitive markers. 
The three logically different functions are represented by the same code.
2. ditto except that this time he wants to find quoted strings
3. ditto but this time his interest is in the grammar and he is 
searching for genitives
4. why he might want to distinguish between singular and plural 
genitives is beyond me.  But he might.

I guess I just don't like one symbol with three meanings.  Imagine this 
in your code, you don't need = == and EQ, one symbol will handle all.

The problem of course is not a Docbook problem, it is in the UTF tables 
(and the linguist would probably be using TEI anyway, but it's not a TEI 
problem either)

In my case all my quotes in XML tags are done on the keyboard #x27, all 
my text quotes are <quote>, all my apostrophe marks and genitives are 
&apos so a simple global edit puts all to rights for me - now that I 
know to use &rsquo


Dave Pawson wrote:
> On 26/01/10 00:53, Ron Catterall wrote:> 
> Beg to differ Ron, English appears not to require more than one?
> Is it simply for your search needs?
> The only different one in your previous list is the prime symbol, U+2032.
> The remainder should be the same.
> regards

Ron Catterall Ph.D. D.Sc.

S/MIME Cryptographic Signature

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]