Computer Interpreter for Translating Written Sinhala to Sinhala Sign Language

Sinhala Sign Language (SSL) is the preferred medium of communication amongst the deaf community of Sinhala. A conversational sign in SSL represents a word or phrase in Sinhala whereas a fingerspelling sign represents a character in the Sinhala alphabet. A word in Sinhala has different word forms based on tense and case structure of the Sinhala language. In the early stages of SSL evolution, a single SSL sign represented a base Sinhala word and all its word forms. This technique has deprived essential word form information to a deaf person. The SSL research community has come up with a technique of showing few characters of the expected word form in fingerspelling as prefix and/or suffixes with the sign of the base word to solve the problem. This research paper presents a database driven translator, which implements the said technique by translating written Sinhala sentences into SSL sequences. It gives the opportunity to the SSL research community to popularize this translation technique among SSL users. Moreover, a 3D avatar built earlier for animating spoken Sinhala in SSL is used to animate the translated SSL sequence. A test sample of 100 sentences and a word form database of 500+ entries to match 50 conversational SSL signs (30 nouns and 20 verbs) are formed to test the translator. The database driven search mechanism provided 100% successful translator performance while having 70.83% average sentence identification rate for six SSL users including three professional SSL interpreters.


Introduction
Sinhala and Tamil languages are the two official languages of Sri Lanka.However, the aurally handicapped Sinhala community uses Sinhala Sign Language (SSL) as their preferred medium of communication.Several deaf schools scattered throughout Sri Lanka use SSL to teach the regular Sinhala curriculum to schoolchildren.The Sinhala community lacks the knowledge of SSL, thereby the services of a sign interpreter is required to communicate with a Sinhala deaf person.Since there are only a handful of professional SSL interpreters available in Sri Lanka, deaf people face various obstacles in their day-today interactions with the Sinhala speaking community.
Traditional SSL does not have sufficient vocabulary of grammar rules to accommodate complex grammar of written Sinhala.The written Sinhala has 9 cases, 3 tenses with masculine, feminine and question forms.In contrast, SSL may have a single sign gesture to represent all such word forms of a particular Sinhala noun or verb except a few exceptions for past tense verbs and the feminine form of nouns.Therefore, the computerized Sinhala to SSL translators (Liyanapathirana, 2014;Kulaveerasingam, Wellage, Samarawickrama, Perera & Yasas, 2014;Punchimudiyanse & Meegama, 2015a) are replacing all different word forms of a Sinhala word with the common sign of SSL in the same way it is done by the human SSL interpreters.However, this translation technique does not express the exact meaning of written Sinhala sentences to a deaf person due to missing cases and tenses, and it is a serious problem which contributes to the low pass rates among deaf schoolchildren in school examinations.
The SSL community has a different SSL signing technique to solve the written Sinhala to SSL translation problem.This technique is used in supportive textbooks of SSL (Siyambalagoda et al., 2004) that match the regular school curriculum.The said technique uses a combination of a prefix, sign for base word and a suffix to visualize exact Sinhala word form in SSL.For example, to sign the Sinhala word තාත්තාගේ (meaning : father's), the SSL uses the sign for the base word තාත්තා (father) and follow it up with two fingerspelling signs "ේ, ඒ" to show the deaf person that it is තාත්තා +ගේ (තාත්තාගේ).
This research paper proposes a database driven written Sinhala to SSL translator, which emulates the said technique along with traditional Sinhala to SSL translation terminology.This translator system is tested using a 3D avatar system (Punchimudiyanse & Meegama, 2015a) developed for spoken Sinhala to SSL translation.

Sinhala Sign Language
The SSL gestures are classified as conversational and fingerspelling gestures.Conversational gestures are preferred by the SSL community because they represents a word or an entire phrase as shown in figure 1(a) and figure 1 (b).Character by character representation (fingerspelling) is used to show the Sinhala words that are not having SSL signs such as person's name.The fingerspelling alphabet of SSL (Dias & Dias, 1996;Stone, 2007)   Fingerspelling in American Sign Language (ASL) and British sign language (BSL) are straightforward tasks of using individual character signs of an English word.For example, the word "JOHN" can be shown with four fingerspelling signs "J","O","H", "N" of ASL or BSL as shown in figure 2(a).However, SSL fingerspelling uses the phonetic pronunciation of a word to construct the fingerspelling sequence.For example, the same word "JOHN" in Sinhala is written as "ග ෝන්" and its phonetic pronunciation is "ජ්, ඕ, න්", which can be fingerspelled using three SSL fingerspelling signs as shown in figure 2

Techniques for Sign Language Synthesis and Animation
Sign languages differ from one another as shown in "spread the sign" web portal (European Sign Language Center, 2012).Several works on translating a spoken language into respective sign language exist for small sign vocabularies or for specialized domains.Moreover, a large number of researches on sign language synthesis were carried out for the ASL and for the sign languages of the European Union.
The popular technique employed in translation and sign synthesis is word to word mapping of signs to match words in a sentence.If a sign does not exist for a particular word, it is fingerspelled (Stone, 2007).The second technique of sign translation is using parallel corpuses where one corpus has words/phrases of a spoken language and the other corpus contains sequences of signs matching the first corpus.Natural language processing (NLP) techniques and Machine translation techniques are used in parallel corpus based systems for generating translated sign sequences from an input sentence (Almeida, Coheur, & Candeias, 2015;De Martino et al., 2016;Efthimiou et al., 2009;Kouremenos, Fotinea, Efthimiou, & Ntalianis, 2010;Othman & Jemni, 2011;).
The word-to-word translation technique is sufficient to process most Sinhala words having a SSL sign in the traditional Sinhala to SSL translation technique, because all word forms of a Sinhala word are replaced with the base SSL sign.However, the word-to-word sequencing in sign language translation may not give the exact meaning of an input sentence (Elliott, Glauert, Kennaway, Marshall & Safar, 2008;Huenerfauth, 2008;Ong & Ranganath, 2005;Marshall & Safar, 2005).This also happens in the context of this research, because it requires introducing fingerspelling signs to the conversational SSL sequence to sign word forms of Sinhala.
Three techniques are used in sign synthesis and animation.The first technique is joining word-by-word videos of human interpreters (Solina, Krapez, Jaklic & Komac, 2000) and refinements to that technique are suggested in (Chiu, Wu, Su, & Cheng, 2007;Chuang, Wu, & Chen, 2006) for providing a smooth transition between videos.
The second technique joins motion captured animation sequences of signs to animate 3D avatars (Awad, Courty, Duarte, Le Naour & Gibet, 2009;Cox et al., 2002).A markup language SiGML (Elliott, Glauert, Kennaway, & Parsons, 2001) is used to define signs from a common set of motion captured hand postures and their transitions in the project "E-sign" (eSIGN project, 2002) that went on to translate government web pages in the British, Dutch and German sign languages.The Japanese text to Japanese sign language (JSL) translation (Kaneko, Hamaguchi, Doke & Inoue, 2010) used a TV program Making language (T2VLab, NHK Science and Technical Research Laboratories, 2011) to embed bio-vision hierarchy (BVH) based motion captured sign sequences to generate television programs that can be played in Microsoft directx based TVML player.
The third technique also uses animated 3D avatars, which shows postures of a sign in key frames and the movements between postures are mathematically calculated (Almeida, 2015;Davidson, 2006;Efthimiou et al., 2009).Huenerfauth (2006) divided a sign posture into several bone channels and animated them independently to achieve the collective movement between two postures.In this research, a 3D avatar utilizing the third technique (Punchimudiyanse & Meegama, 2015a) is used to test the translator.

Methodology
The main goal of this research is to process Unicode Sinhala sentences and translate it to SSL sequence in a mixture of traditional and written Sinhala signing techniques for word forms.For that purpose, a synonym database and a word form database are proposed.An algorithm is designed to perform the written Sinhala to SSL translation.To test the translation, an existing 3D avatar is used.

Building synonym database for the translator
A Sinhala word having an SSL sign may have a lot of Sinhala synonyms related to it appearing in an input sentence.In the traditional Sinhala to SSL translation, any word form of a synonym is replaced with the sign of the base word.For this purpose, a synonym database is built for Sinhala base words that have conversational signs defined in the sign database.The synonym database has the following entry format to denote the base word and its synonyms.
#baseword,synonym, ..,SynonymN, <carriage return> An entry in the synonym database will look like, #අම්මා,මව,මවුන්,මාතා,මාතෘ,මෑණිග ෝ,මම්මා,..,අම්ි,<carriage return> However, the traditional translation technique is used only when a word form of synonym/base word does not have a specialized SSL signing sequence.The next section describes types of those word forms and their respective SSL sequences for the Sinhala nouns and verbs.

Building Word form database for the translator
There are nine cases (විභක්ති) in Sinhala for nouns, which can create nine word forms each for singular and plural nouns.For example, different cases that a singular noun තාත්තා (meaning: father) in Sinhala is given in Table 1.Moreover, Table 1 shows the SSL sequence for each case of the noun "father".It should be noted that the suffix in SSL sequence for a particular case is not uniform across all the nouns in Sinhala.For example, the suffix තත් is used in the noun තාත්තා in the case ආලපන to make it තාත්තත්.However, the words අයියා and මාමා have the suffixes තේ and තේ to make them අයිතේ and මාතේ in the case ආලපන.

Call father
The situation becomes complex when a synonym with a word form that matches a noun having SSL sign exists in a sentence instead of the said noun.For example, the word form "පි ාගෙන්" is a synonym word form of the noun "තාත්තා".According to the new signing technique, it has the SSL translation "පි", "තාත්තා", "ගෙන්", which is a combination of a prefix, sign for base word and a suffix.In this situation, prefix "පි" denotes "පි ා", which is the synonym of "තාත්තා", and the suffix "ගෙන්" denotes the case.
When a plural noun comes for the translation, the sign for "ගොඩක්" (meaning: more than one) has to be introduced into the SSL sequence.Few examples of singular and plural synonyms with different cases are given in the Table 2.A verb in Sinhala is having more variations than a noun.It can have singular and plural forms, three tenses, three persons, masculine, feminine and question forms as shown in the Table 3 for the verb යනවා (meaning : go).Moreover, the Table 3 shows how a past tense verb is signed using the sign of a present tense verb followed up with the sign "finished".For example, the verb ගි ා (meaning : went) becomes ගි ා ඉවරයි.Furthermore, when a variation of a 1 st 2 nd 3 rd person is attached to a past tense verb, the last few characters of the verb should also be followed.
As it was done for a noun, if a synonym to the base verb comes in a sentence, a suitable prefix and suffix would be added to the SSL sequence to denote synonym and its word form.For example, the verb ෙමන් කතලෝය is a past tense plural 3 rd person synonym word form of the base word නවා.The SSL sequence for this synonym verb could be derived as ෙ යනවා කතලෝය or ෙ යනවා ඉවරයි තලෝය.These two translation variations for the word form ෙමන් කතලෝය indicate that it is difficult to set a single rule to determine SSL sequence multi-word word forms.To store all such SSL sequences for different word forms of a word, the following entry format is used in preparing a scalable word-form database as a text file.Synonym, prefix segment <space> base word <space> suffix segment# <carriage return> An entry in the word form database will look like නිවසට,නි ගෙදර ට#<carriage return>

Design of the Written Sinhala to SSL Translator
In the traditional Sinhala to SSL translation, all word forms of a synonym or all word forms of base word are replaced with the base word of the sign.Under the written Sinhala to SSL translation technique, the respective SSL sequence of a word form of a word should replace the word.Therefore, the following algorithm is designed to process an input sentence and translate them to appropriate SSL sequence.// After the search and replacement process is complete S has the output //SSL sequence in Unicode Sinhala.The text that does not have //conversational signs will not be changed.

Return S
Algorithm functions as follows.An entire phrase of Sinhala can have a single SSL sign, which are searched first.For that task, the words in the input sentence are divided into 3-gram, 2-gram, 1-gram (i.e.3-word, 2-word, 1-word) segments and searched in the synonym database from the largest segment to the smallest segment.This technique will identify synonyms (or word forms of base words), which have signs in the sign database.This search order prevents signs of individual words from replacing the parts of a phrase having a single sign.
After identifying the synonym location through the search, the base word could be obtained through a backward search for a # symbol, which denotes the base word of the synonym.The next task is to choose whether to use the written Sinhala technique or the traditional technique of SSL translation.If the word form database has an entry to show a specific SSL sequence exists for that particular word, then that SSL sequence will replace the word/phrase of the input sentence.Otherwise, the base word replaces the word in the input sentence.
This algorithm performs the search in 3-gram to 1-gram order.Therefore, the Sinhala phrases having a single sign are translated first.There can be words within such a phrase that are having individual signs too.This may lead to partial replacement when the search goes on to replace 1-grams.To prevent such partial replacements, all the 1 grams of a translated phrase are removed from the search list.
The algorithm is implemented with Visual Basic .NET and the output is generated in the Sinhala Unicode text.However, the 3D avatar used in this research requires the input in English.

Conversion of Unicode Sinhala text to Phonetic English
The phonetic pronunciation of an unknown word is required by SSL to perform fingerspelling.The following rules are used to convert a standard Sinhala text to a phonetic sequence: 1. Introduce a hidden vowel sound "අ" after a consonant without a modifier 2. Modifiers of all consonants are replaced with the vowel sound related to respective modifier.
For the compatibility with the 3D avatar, the phonetic sequence in Sinhala is converted to phonetic English by replacing each character with a label (tag) having English letters as listed in Table 4.The "zwj" abbreviation is used for the nonprintable character zero width joiner (U+200), which is used in writing special modifiers in Sinhala Unicode text such as "rakaransha", "yansaya" and "repaya".
It is easier to use the entire sentence in phonetic English format than having parts of it converted.Therefore, the entire Unicode sentence is converted to phonetic English using an algorithm published in (Punchimudiyanse & Meegama, 2015b) adhering to the above rules.The converted phonetic English SSL sequence is sent to the 3D avatar animation system for live 3D animations.

The Processing performed at the 3D avatar
This research utilizes a 3D avatar and its animation framework, which provides seamless switch over between conversational and fingerspelling signs of SSL (Punchimudiyanse & Meegama, 2015a).It has a posture based sign database, which stores the coordinates of different bones in a posture.The postures of the SSL signs are animated as key frames while the displacement of each bone position in frames in the transition between postures is automatically calculated using a mathematical formula given below.
In this formula, the start value denotes the initial position of a given bone before transition and the end value denotes the final position of a bone after the transition.Number of frames denotes the frames required for the transition to complete.The calculated increment value is the amount of movement required for a bone in direction k per frame in the transition.The block diagram in Figure 3 depicts the processing, which takes place within the animation framework when a translated SSL sequence is sent to the animation system.

Results and Discussion
The existing sign database of the 3D avatar has over 200 conversational signs related to nouns and verbs while 61 fingerspelling signs are defined to perform fingerspelling.However, the written Sinhala to SSL translation is not popular amongst the SSL community.Therefore, the translation algorithm is tested with a small vocabulary of 50 signs and a word form database of 500+ entries related to the vocabulary.In addition, signs for "finished" and "a lot" are used to denote past tense and plural forms.Since, there are 18 entries per noun and 24 entries per verb in the word form database, it was essential to test the success of the SSL signing technique before scaling up the synonym and word form databases.Therefore, a sample of 100 sentences is prepared to include at least one word form matching the 50 SSL signs.The translated SSL sequence of each sentence in the test sample is manually inspected for correct SSL sequence.All the 100 sentences are correctly translated using the written Sinhala to SSL translation algorithm and Table 6 summarizes the translation results.Translation performance of the algorithm reached 100% because of the small-size of sentence sample and database driven word form replacement.The small test sample enabled the authors to quickly develop and demonstrate the working prototype to SSL user community to motivate them to determine word form to SSL translations instead of authors determining those SSL sequences.
It is observed that the words ending with suffixes "ගත්" and "ගේ" to denote a specific word form should originally exist in an input sentence without a space between the base word and suffix to have the correct translation.Otherwise the translator will misinterpret those suffixes individually as their individual meanings "tea" and "house" respectively.For example, the word "අයි ාගේ" is in correct form, which gives the meaning "brother's" in the written Sinhala translation while "අයි ා ගේ" gives the translation sequence as two signs "brother house".
The phonetic English sequence of the SSL translation of the first sentence in Table 5 is "ma amwmaxa batxhw kanavaxa nixiya" and it is sent to the 3D animation system.Figure 4 shows the animation sequence of the 3D avatar.Hundred sentences of the test sample were sent through the 3D avatar animation system and the animations were screen captured.Those screen capture videos were shown to three professional interpreters and three SSL teachers without disclosing the transcript.The word form identification rates are given in Table 7.

Table 7: Word form/Sentence Identification rates of 6 SSL users
Recognition results indicate 70.83% sentence identification rate and 71.96% word form identification rate for the translated SSL sequences out of the 100-sentence test sample.The test users have indicated that the novelty and the lack of regular use of the translation technique have contributed to a lot of word form identification errors.Moreover, they indicated that the animation of the fingerspelling character ඒ needs to be slowed down in animations in order to properly identify suffixes such as "ගේ" and "ගත්".Furthermore, they have suggested to show all the SSL signs related to a word form in a single motion to indicate that they are part of a single word form.For example, the signs "පි","තාත්තා", "ට" in the word තාත්තාට has to be signed in a continuous motion without showing space character in between signs.
Test users are optimistic that the sign identification rate of the system will improve over time when this technique is in regular use.

Conclusion
This study highlights a small-scale implementation to build a computerized translator for emulating the translation technique of written Sinhala to SSL.A SSL sequencing algorithm with synonym and word form databases provides the backbone of the system, which is tested with a small sign vocabulary to leave room for expansion.

Interpreter
Identified Moreover, translated SSL sequence is generated in the format compatible with the 3D avatar system already developed by us.
Our observations indicate that this technique is less popular among the SSL community due to lack of training and the non-existence of a common word form to SSL translation vocabulary.Therefore, we propose to expand the word form and synonym database of the system with the help of the SSL community before performing a comprehensive analysis on computerized translation performance.In the computer science perspective, many word forms that exist for a Sinhala word will expand the word form database into millions of entries.This will make searching for entries in the word form database slower than expected.Therefore, it is necessary to find patterns in word forms to reduce the size of word form database or use indexed searching techniques in future after gathering a large number of SSL sequences for different Sinhala word forms with the support of the SSL user community.
contains 60 finger spelling signs representing the 60 characters of the modern Sinhala alphabet (NIE SL, 1989) and a fingerspelling sign for the ඥ character in Unicode Sinhala standard (Unicode.org,2016).

Figure 3 :
Figure 3: Processing that happens at the 3D avatar animation system

Table 4 : Tags used in Unicode Sinhala to phonetic English conversion
Table 5 indicates 10 translations of the test sample.