A Little Bit of Background
One of the best things about our world is the diversity. We see it everywhere – in clothing, customs, and even in our names. If we drill down a little deeper, we can see that these differences even exist in how we structure and write our names! However, differences in alphabets and name structures can sometimes cause difficulties for industries like ours.
Machine Learning Specifically for Chinese…
To humans, differences in name structure and alphabet are something we can understand relatively easily. As humans, we can understand cultural differences and different orders of names. We are also able to understand which nicknames translate to which full names. Unfortunately, these same differences can make a name translation engine’s life very hard. How do you teach a machine that Xiaoming is represented in many ways in Simplified Chinese and that it must pick the correct version? There is no simple answer, but GDC’s Intelligent Translation Name Engine works to remedy this issue by applying logic to each transaction with machine learning.
The Chinese alphabet is a logographic alphabet, meaning each character or graphic represents a word. Often, these words can carry different meanings based on their distinct tonalities, which are learned by memorization. The 26-character Latin script is a phonetic alphabet, meaning each letter represents a different sound and these sounds are strung together to create meaningful words. Thus, direct translations are not always equivalent; the Latin alphabet simply cannot accommodate all the sounds and variations of Simplified Chinese. To summarize, Chinese to Latin translations are fairly simple, but reversing the translation is difficult because the distinct tonalities are lost.
Knowing this, GDC fed the INTE over a million verified Chinese and Latin associations. Using this as an initial training set, the proprietary machine learning model was provided additional data attributes and retrained each time it was run in a “feedback loop”. This feedback loop is especially important as the INTE continues to learn from each verified match to refine and improve its results when it is run.
The INTE runs in the background of all Chinese verifications, seamlessly boosting name translation capabilities. Just keep using your GDC integration as you were before. If you don’t have one… we should talk. 😊
When GDC receives your request for translation needs, GDC communicates the information in the name fields (and only the name fields!) with INTE. INTE compiles a list of name variations along with the probabilities of being the correct match. It decides the most probable name, sends it to the data provider, and the data provider verifies it against other input fields such as date of birth, telephone number, or national ID. The final results are passed back to you based on your match rules and voila! Your life just got much easier.