[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [docbook-apps] Apostrophe in docbook document
On Tue, 26 Jan 2010 14:42:34 -0600, Ron Catterall <ron@catterall.net> wrote: > Imagine a linguist wanting to search some text to count > ... > The problem of course is not a Docbook problem, it is in the UTF tables The problem is with neither, it is with the linguist :-). (I can say that, because I'm a linguist.) All seriousness aside, using corpora for linguistics requires more than looking for certain Unicode characters, which may not be used consistently anyway (and especially in a case like this, where the characters--if they were distinct Unicode characters--would doubtless be confused). Distinguishing between quotes and apostrophes requires some fairly complex methods. There are rules of thumb that often work, but they will break on certain cases. Corpora linguists become familiar with where these things break, and construct work-arounds accordingly, or hand-tag recalcitrant cases. If you really want an interesting problem, go for distinguishing among the uses of the ASCII period! Mike Maxwell
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]