OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

odata message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: [OASIS Issue Tracker] (ODATA-1354) Add support for SOUNDEX expressions


Michael Pizzo created ODATA-1354:
------------------------------------

             Summary: Add support for SOUNDEX expressions
                 Key: ODATA-1354
                 URL: https://issues.oasis-open.org/browse/ODATA-1354
             Project: OASIS Open Data Protocol (OData) TC
          Issue Type: Bug
          Components: Protocol, URL Conventions
    Affects Versions: V4.02_WD01
         Environment: Proposed
            Reporter: Michael Pizzo
             Fix For: V4.02_WD01


Introduction
This is a proposal to introduce SOUNDEX functionality to Open Data Protocol, the foundation of this feature will rely on the phonetic algorithm for indexing strings by sound, as pronounced in English.

The goal is for homophones to be encoded to the same representation so they can be matched despite minor differences in spelling them, then expose that through RESTful API OData calls.

This proposal will navigate through the details of the feature and its potential implementation in OData.

Algorithm
SOUNDEX converts an alphanumeric string to a four-character code that is based on how the string sounds when spoken. The first character of the code is the first character of character expression, converted to upper case. The second through fourth characters of the code are numbers that represent the letters in the expression. The letters A, E, I, O, U, H, W, and Y are ignored unless they are the first letter of the string. Zeroes are added at the end if necessary, to produce a four-character code.

For example, the name Michelle and Michael both return SOUNDEX value of M240, while David for instance will return a SOUNDEX value of D130 which makes Michael a more of a nearly sounding match to Michelle than David.

Rules
SOUNDEX follows the NARA coding rules which are as follows:
1.	Coding consists of a letter followed by three numerals.  Examples: L123, C472, S160.
2.	The first letter of a surname is not coded, it is retained as the initial letter.
3.	A, E, I, O, U, Y, W, and H are not coded.
4.	Double letters are coded as one letter (as in Lloyd).
5.	Prefixes to surnames like "van", "Von", "Di", "de", "le", "D", "dela" or "du" are sometimes disregarded in coding.
6.	Code the following letters to three digits, using 0 at the end if needed.

SOUNDEX system is based on a coding guide as represented in the following table:
Number	Represents the Letters
1	B, F, P, V
2	C, G, J, K, Q, S, X, Z
3	D, T
4	L
5	M, N
6	R
Not Coded	A, E, I, O, U, Y, W, H




--
This message was sent by Atlassian Jira
(v8.3.3#803004)


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]