Arabic Name Search
Arabic Name Search™ is Celatro's
pattern matching plug-in designed for the retrieval of Arabic personal names.
Matching Arabic names is a notoriously difficult task for a
variety of reasons. Arabic names can be morphologically very complex, and can,
in addition to the name itself (i.e. the root),
contain prefixes, articles and suffixes. Some of these non-root name
constituents create variations of the same name, whereas others uniquely
identify one name as opposed to another. For example, "al Zarqawi" and
"Zarqawi" could refer to the same person, although the former contains the
article "al" whereas the latter does not. On the other hand, "Abd-ur-Rachman"
and "Ab-ur-Rachman" are different names, and therefore probably do not refer to
the same person; despite having the same root ("Rachman"), they contain
different prefixes - "abd" ("servant of") versus "abu" ("father of"), and this
is what makes them different. It
is therefore very important to have a retrieval mechanism which will only
return names that have the same semantic value as the query. Celatro's Arabic
Name Search has built-in knowledge about the semantic value of all name
constituents, and uses this knowledge to create the correct match list for a
given query.
An additional problem facing Arabic name retrieval is the huge
number of possible name variations in the original language. Some of these
variations are a product of spoken language contractions (i.e. "Nur-al-Din",
"Nureddine"; "Abd-al-Salam", "Abdussalam"), and others stem form different
pronunciations of the same name in different dialects of Arabic (e.g.
"Muhammad", "Emhemmed"). The number of name variations is further enlarged by
transliterations from the Arabic into Roman script. Not only do different
languages (e.g. French and English) have different ways to transliterate the
same Arabic name, they also frequently have very vague transliteration rules,
which yield many variations even within the same language (e.g. "Gadafi",
"Kaddafi", "Ghaddafi", "Quadhafi', etc. in English).
Arabic Name Search™ takes all the above issues into
account. Its sophisticated parsing technologies identify Arabic name constituents
(prefixes, articles, roots, and suffixes), irrespective of spelling
variations. Each
constituent is treated differently, depending on its relative
semantic importance. This enables Arabic Name Search to produce accurate results
even when operating on incomplete names. If presented with a full
name that contains more than one component, it not only parses out
the constituents, but also assigns them to an individual component
(e.g., the components of "Mohammad Khayr-ud-Dene Al Arussi" are
"Mohammad", "Khayr-ud-Dene", "Al Arussi"). This enables the
retrieval of lower-scoring partial matches (e.g. "Mohammad Al
Arussi"). Since many Arabic names have a large number of components, not all of which are
necessary to identify an individual, this approach results in a very
high accuracy rate, especially when compared to other techniques
which operate by generating large lists of name variations from a
given Arabic name, and then trying to find an exact match within the
generated list.
|