OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

docbook-apps message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [docbook-apps] Apostrophe in docbook document


On Tue, 26 Jan 2010 14:42:34 -0600, Ron Catterall <ron@catterall.net>
wrote:
> Imagine a linguist wanting to search some text to count
> ...
> The problem of course is not a Docbook problem, it is in the UTF tables 

The problem is with neither, it is with the linguist :-).  (I can say
that, because I'm a linguist.)

All seriousness aside, using corpora for linguistics requires more than
looking for certain Unicode characters, which may not be used consistently
anyway (and especially in a case like this, where the characters--if they
were distinct Unicode characters--would doubtless be confused).  

Distinguishing between quotes and apostrophes requires some fairly complex
methods.  There are rules of thumb that often work, but they will break on
certain cases.  Corpora linguists become familiar with where these things
break, and construct work-arounds accordingly, or hand-tag recalcitrant
cases.

If you really want an interesting problem, go for distinguishing among the
uses of the ASCII period!

   Mike Maxwell


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]