emergency-msg message
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]
Subject: Accents, characters, and unusual punctuation in CAP
- From: "Ham, Gary A" <hamg@BATTELLE.ORG>
- To: cap-list@lists.incident.com, emergency-msg@lists.oasis-open.org,dm-open-sig@list.dmi-services.org
- Date: Thu, 21 Dec 2006 15:45:51 -0500
Question for those
who use and implement CAP messaging; particularly those using it for
implementations where the text data might be in a non-English
language.
We recently came
upon an issue regarding character sets and language:
Certain data was
being being processed in our internal system Java as UTF-8 for languages that
need at least UTF-16 to handle. This caused characters with accents common in
Spanish or French to cause processing exceptions. Since Java uses Unicode
internally, the fix to allow accented characters is not hard. You just need to
set a value in a couple of place in the code.
But... It bring up a
bigger question. The language tag in the info block can be used to
validate/determine how to read the data in Unicode in CAP messages written
in languages than use non-Roman characters or unusual accents on Roman
characters. This would make translation on the receiving end much simpler
and more consistent. But, how about mixed information? The
simple example is Spanish or French place names in English where the
accenting is not recognized. A certain laxness in processing can
handle that for the most part. The more challenging case is something
typical in Japan, for example, where the mixed use of character sets in written
communication is quite common. Japanese writing in Roman letters, but
using some Japanese characters is one example. Another example is text in
Japanese characters except that a non-Japanese place name is written in its
native character set instead of, or as well as, its katakana (Japanese
characters used for foreign words) representation. I suspect that is
might be the case in other languages as well.
Question, should we
validate info block content by language? Should we even process text
content by language? Or, is it just a translation problem on either end to
be left to user systems? (It may not be
trivial.)
Respectfully,
Gary A. Ham
Battelle Memorial
Institute
External Systems Interoperability
Coordinator
Open Platform for Emergency
Networks
Disaster Management e-Gov
Initiative
Office for Interoperability
and Compatibility
Science and Technology
Department of Homeland
Security
540-288-5611 (office)
703-869-6241
(cell)
"You would be surprised what
you can accomplish when you do not care who gets the credit." - Harry S.
Truman
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]