[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: RE: [emergency-msg] Accents, characters, and unusual punctuationin CAP
+1 para Español. At 1:12 PM -0600 12/26/06, Aymond, Patti wrote: >I find that mixed languages is quite common. >There are many non-English words that are used >so frequently that they are accepted as defacto >English (e.g. du jour, hors d'oeuvre, déjà vu, >etc.) In my part of the world as in Canada, this >is even more common with the concentration of a >French-speaking population. We certainly >wouldn't want alerts & notifications to not go >out because such words were included in the >message. > >IMHO, > >Patti >Patti Iles Aymond, PhD >Senior Scientist, Research & Development >Innovative Emergency Management, Inc. >Managing Risk in a Complex World >8555 United Plaza Blvd. Suite 100 >Baton Rouge, LA 70809 >(225) 952-8228 (phone) >(225) 952-8122 (fax) > >From: Ham, Gary A [mailto:hamg@BATTELLE.ORG] >Sent: Thursday, December 21, 2006 2:46 PM >To: cap-list@lists.incident.com; >emergency-msg@lists.oasis-open.org; >dm-open-sig@list.dmi-services.org >Subject: [emergency-msg] Accents, characters, and unusual punctuation in CAP > >Question for those who use and implement CAP >messaging; particularly those using it for >implementations where the text data might be in >a non-English language. > >We recently came upon an issue regarding character sets and language: > >Certain data was being being processed in our >internal system Java as UTF-8 for languages that >need at least UTF-16 to handle. This caused >characters with accents common in Spanish or >French to cause processing exceptions. Since >Java uses Unicode internally, the fix to allow >accented characters is not hard. You just need >to set a value in a couple of place in the code. > >But... It bring up a bigger question. The >language tag in the info block can be used to >validate/determine how to read the data in >Unicode in CAP messages written in languages >than use non-Roman characters or unusual accents >on Roman characters. This would make translation >on the receiving end much simpler and more >consistent. But, how about mixed >information? The simple example is Spanish or >French place names in English where the >accenting is not recognized. A certain laxness >in processing can handle that for the most part. >The more challenging case is something typical >in Japan, for example, where the mixed use of >character sets in written communication is quite >common. Japanese writing in Roman letters, but >using some Japanese characters is one example. >Another example is text in Japanese characters >except that a non-Japanese place name is written >in its native character set instead of, or as >well as, its katakana (Japanese characters used >for foreign words) representation. I suspect >that is might be the case in other languages as >well. > >Question, should we validate info block content >by language? Should we even process text content >by language? Or, is it just a translation >problem on either end to be left to user >systems? (It may not be trivial.) > >Respectfully, > >Gary A. Ham >Battelle Memorial Institute >External Systems Interoperability Coordinator >Open Platform for Emergency Networks >Disaster Management e-Gov Initiative >Office for Interoperability and Compatibility >Science and Technology >Department of Homeland Security >540-288-5611 (office) >703-869-6241 (cell) >"You would be surprised what you can accomplish >when you do not care who gets the credit." - >Harry S. Truman >IEM CONFIDENTIAL INFORMATION PLEASE READ OUR NOTICE: ><http://www.iem.com/e_mail_confidentiality_notice.html>http://www..iem.com/e_mail_confidentiality_notice.html -- Rex Brooks President, CEO Starbourne Communications Design GeoAddress: 1361-A Addison Berkeley, CA 94702 Tel: 510-849-2309
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]