OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

emergency-msg message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: RE: [emergency-msg] Accents, characters, and unusual punctuationin CAP

+1 para Español.

At 1:12 PM -0600 12/26/06, Aymond, Patti wrote:
>I find that mixed languages is quite common. 
>There are many non-English words that are used 
>so frequently that they are accepted as defacto 
>English (e.g. du jour, hors d'oeuvre, déjà vu, 
>etc.) In my part of the world as in Canada, this 
>is even more common with the concentration of a 
>French-speaking population. We certainly 
>wouldn't want alerts & notifications to not go 
>out because such words were included in the 
>Patti Iles Aymond, PhD
>Senior Scientist, Research & Development
>Innovative Emergency Management, Inc.
>Managing Risk in a Complex World
>8555 United Plaza Blvd.   Suite 100
>Baton Rouge, LA 70809
>(225) 952-8228 (phone)
>(225) 952-8122 (fax)
>From: Ham, Gary A [mailto:hamg@BATTELLE.ORG]
>Sent: Thursday, December 21, 2006 2:46 PM
>To: cap-list@lists.incident.com; 
>Subject: [emergency-msg] Accents, characters, and unusual punctuation in CAP
>Question for those who use and implement CAP 
>messaging; particularly those using it for 
>implementations where the text data might be in 
>a non-English language.
>We recently came upon an issue regarding character sets and language:
>Certain data was being being processed in our 
>internal system Java as UTF-8 for languages that 
>need at least UTF-16 to handle. This caused 
>characters with accents common in Spanish or 
>French to cause processing exceptions.  Since 
>Java uses Unicode internally, the fix to allow 
>accented characters is not hard. You just need 
>to set a value in a couple of place in the code.
>But... It bring up a bigger question.  The 
>language tag in the info block can be used to 
>validate/determine how to read the data in 
>Unicode in CAP messages written in languages 
>than use non-Roman characters or unusual accents 
>on Roman characters. This would make translation 
>on the receiving end much simpler and more 
>consistent. But, how about mixed 
>information?  The simple example is Spanish or 
>French place names in English where the 
>accenting is not recognized.  A certain laxness 
>in processing can handle that for the most part. 
>The more challenging case is something typical 
>in Japan, for example, where the mixed use of 
>character sets in written communication is quite 
>common.  Japanese writing in Roman letters, but 
>using some Japanese characters is one example. 
>Another example is text in Japanese characters 
>except that a non-Japanese place name is written 
>in its native character set instead of, or as 
>well as, its katakana (Japanese characters used 
>for foreign words) representation.  I suspect 
>that is might be the case in other languages as 
>Question, should we validate info block content 
>by language? Should we even process text content 
>by language?  Or, is it just a translation 
>problem on either end to be left to user 
>systems?  (It may not be trivial.)
>Gary A. Ham
>Battelle Memorial Institute
>External Systems Interoperability Coordinator
>Open Platform for Emergency Networks
>Disaster Management e-Gov Initiative
>Office for Interoperability and Compatibility
>Science and Technology
>Department of Homeland Security
>540-288-5611 (office)
>703-869-6241 (cell)
>"You would be surprised what you can accomplish 
>when you do not care who gets the credit." - 
>Harry S. Truman

Rex Brooks
President, CEO
Starbourne Communications Design
GeoAddress: 1361-A Addison
Berkeley, CA 94702
Tel: 510-849-2309

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]