Figure 1 Pictographs in everyday life Figure 2 Simplex Sclera pictographs for dream pic. tographs stew and large, of a sequence of pictographs by conveniently structuring. its representation after identifying the different roles which. the phrases in the original sentence play with respect to the. verb They use structured semantic role labelling for this. Joshi et al 2006 describe an unsupervised approach for. automatically adding pictures to a story extracting seman. tic keywords from a story and searching an annotated image Figure 3 Verb object pictographs feed the dog eat a. database However they do not try to translate the entire sandwich pick strawberries or more complex stand. story on a chair in front of the sink, A resource we certainly have to mention is ImageNet Deng. et al 2009 a large scale ontology of images linked to. the WordNet structure aiming to populate the majority of. the Wordnet synsets The images in ImageNet seem to be represent simple concepts corresponding to single Dutch. mostly photographs and are therefore less suitable for com words but often they represent more complex concepts cor. munication aids for the cognitively challenged responding for instance to a verb and its objects Fig. ure 3 to two or more nouns Figure 4 or to nouns and. 3 The resources prepositional phrases Figure 5 There are mainly only pic. After introducing Sclera the pictograph set used in this tographs for content words and hardly any pictographs for. paper we present Sclera2Cornetto a resource linking the prepositions or adverbs. Sclera pictographs to synonym sets in Cornetto Cornetto6 Such pictographs may contain very strict instructions as. is a lexical semantic database which is linked to the Eu shown in Figure 6. roWordNet7 grid and to the SUMO ontology 8 consisting Although Sclera mainly contains black and white pic. of 118 000 synonym sets synsets which are linked to tographs some of them are green indicating that. each other through several relationships such as hyponymy something is permitted or approved while others are red. meronymy and antonymy The words that are in the Cor indicating a ban or disapproval In some others another. netto database are either verbs nouns adjectives or ad color is used for contrast or to indicate the color itself A. verbs It is freely available for non commercial use from ban or disapproval may also be expressed by a red cross. the Dutch HLT centre 9 through the pictograph, Additionally we also present Dutch2Sclera a small dictio. nary linking Dutch words that do not appear in Cornetto As mentioned above Sclera was originally used as a means. with pictographs to communicate directives to its IDD users pupils resi. dents with as few pictographs as possible However the. 3 1 Sclera last decade more and more attention is being paid to the. Sclera is a large set of mainly black and white pictographs communicative needs of people with IDD the keyword be. Originally these were used as directives just like the pic ing social inclusion These users are also entitled to partici. tographs we are confronted with in everyday life as shown pate in the modern digital world by sending e mails chat. in Figure 1 ting with friends and using social networks among other. When people are not able to write and or read fluently pic things Pictographs are now used in a broader context i e. tographs may provide a solution Therefore schools and they are on a par with natural languages such as Dutch and. institutes for people with IDD have since long been using English similar to sign languages. pictographs to guide their pupils and residents As a baseline we take the Dutch to Sclera system as it was. There are currently over 13000 Sclera pictographs and. new pictographs are created every month upon user re. quest These pictographs are freely available as png files. with a filename indicating their meaning in Dutch English. French and Spanish 10 As shown in Figure 2 they can. http tst centrale org producten lexica cornetto 7 56. http www illc uva nl EuroWordNet, http www ontologyportal org Figure 4 Pictographs with two or more nouns potatoes. 9 rice and pasta pear and apple,http tst centrale org.
In this paper we only refer to our work for Dutch, Figure 5 Pictographs with nouns and prepositional Figure 8 Some pictographs on ways to draw someone s. phrases swimming suit in towel chairs on table attention draw attention positive and draw attention. Figure 6 Some pictographs with instructions for personal. hygiene wash between toes with soap and dry between Figure 9 Abstract concepts equal opportunities and. your toes repeat, in 2012 before we started working on improving text to In some cases this is due to the fact that the concepts in. pictograph conversion At that time a sentence like Ik kom volved are hard to put into pictographs like determiners. naar huis was converted as shown in Figure 7 inflection of a verb or because the pictographs mainly ex. We are now treating Sclera as a language albeit a simpli press the concept expressed by the lemma In some cases it. fied one Note that this does not imply that it is simple is not the lemma that is associated with the pictograph but. Pictograph languages are learned they do not come nat for example the plural form when the singlar is lacking. urally It is for instance hard to understand the concepts. of the pictographs in Figures 8 and 9 without learning that 3 2 Linking Sclera with Cornetto. these pictures stand for these concepts We have manually linked a subset of 5710 Sclera pic. When treating messages in Sclera as expressed in natural tographs to Cornetto synsets A tool was built which took. language instead of an ad hoc complex of pictographs each Sclera pictograph and checked the Cornetto database. some characteristics of this natural language should be to see whether there was an entry with the same name as. mentioned the filename without the png extension If not the an. notator could select one of the senses of the entry If this. no articles was not the case the annotator could enter a synonym and. then select the appropriate sense or she could tell the an. no possessive pronouns notation tool to connect the pictograph to multiple synsets. providing lemmas that have these synsets as words For. no inflection, each of these lemmas the appropriate sense was chosen by. no tenses the annotator, As these pictographs sometimes depict complex concepts. few auxiliaries mainly to be they can be linked to one or to more synsets indicating that. their meaning combines the meanings of the synsets In. mostly the same pictograph for singular and plural these cases we have identified one of the synsets as the head. synset indicating that the other linked synsets are in some. kind of dependency relation with the head synset Table. 1 presents the distribution of the synsets per linked picto. graph In cases where the pictograph meaning was not re. flected by one or more synsets we often 240 times for sim. plex pictographs have linked the pictograph to the synset. of its hyperonym, Sclera2Cornetto consists of a database table with the fol.
lowing columns,lemma the name of the pictograph,for simple pictographs. Figure 7 Literal conversion before 2012 words that do. not function as filenames minus png remain untranslated synset synset identifier matching Sclera picto. Nr of synsets Frequency of links between,Sclera pictographs and synsets. 5 3 Figure 10 Child son daughter vs child youngster. Table 1 Distribution of Sclera pictographs over number of. token lemma tag picto,en en VG neven plus png, sneeuwen sneeuw png Figure 11 Hello vs day homonyms in Dutch. hallo hallo zeggen 2 png,groet hallo zeggen 2 png, groeten hallo zeggen 2 png pictographs An overview of the architecture of this system. is presented in Figure 12, The input text first undergoes a shallow linguistic analy.
Table 2 Some entries in the Dutch2Sclera dictionary sis section 4 1 tokenisation part of speech tagging sen. tence splitting word based spelling correction for unknown. relation whether the synset is syn words separable verb detection and lemmatisation. onym hyperonym of pictograph In a second stage the synsets of the lemmas of the words. are retrieved from Cornetto section 4 2, for complex pictographs In a third stage the input sentence is translated into pic. tographs section 4 3,head synset identifier of head. headrel relation of synset to pictograph syn 4 1 Shallow Linguistic Analysis. onym hyperonym The first step that we apply is tokenization splitting of all. dependent comma separated list of synset iden the punctuation signs from the words apart from the hy. tifiers for dependents phen dash and the apostrophe using a rule based tokenizer. The next step concerns word based spelling correction as. deprel comma separated list of relations syn a lot of the messages contain spelling mistakes These. onym hyperonym of synsets for each dependent spelling mistakes cannot be considered typing mistakes. 3 3 The Dutch2Sclera dictionary for Dutch but are real errors against the Dutch spelling Therefore. words not covered by Cornetto we implemented automatic spelling correction based on the. OpenTaal11 lexicon For every word that is not in this lex. We also make our Dutch2Sclera dictionary table available. icon we check all variants with one deletion one insertion. consisting of 372 entries linking Dutch words straight to. or one substitution For all the variants present in the Open. Sclera pictographs This table contains token lemma part. Taal lexicon we take the one that most frequently occurs. of speech tag and picto columns allowing underspecifica. in the 80 million word corpus This is currently a 1 gram. tion cf Table 2 The tagset used is Van Eynde 2005, model In future versions we might consider higher order. Currently the Dutch to Sclera translation system described. versions if it is deemed necessary, in section 4 uses this dictionary to distinguish between. Then we apply part of speech tagging We use HunPos. words where we have pictographs for different meanings of. Halcsy et al 2007 a trigram based open source tag, the word such as kind child sense son or daughter and.
ger similar to TnT Brants 2000 using the D Coi tagset. sense youngsters as shown in Figure 10 or dag mean. Van Eynde 2005 trained on the SoNaR corpus Oostdijk. ing either hello or day cf Figure 11 One of the things. et al 2013, on our to do list is to implement proper word sense disam. As the system is intended to translate e mail messages for. mentally challenged people messages tend to be short and. mostly consist of only one sentence Nevertheless some of. 4 Using the resources to translate Dutch the messages contain more than one sentence so we apply. into Sclera sentence detection as the translation engine works sentence. We have built a text to pictograph translation system that is based. used by the WAI NOT online AAC platform which allows Dutch contains separable verbs These are verbs that have. people who are not able to read and write to communicate a lexical core and a separable particle In some syntac. through the internet In this section we briefly report on tic situtations the core and the particle are written as one. this system showing how the presented resources improve. the precision and recall in converting Dutch text into Sclera http www opentaal org. Figure 12 Architecture of the text to pictograph translation system. word while in other situations they are written separately As a second step in semantic analysis we look up all the. Particles can have different part of speech tags according possible Cornetto synsets connected to the lemma of each. to the tagset we use Van Eynde 2005 The most fre word We filter these synsets keeping those where the part. quent part of speech tags for particles are final prepositions of speech of the synset agrees with the part of speech main. VZ fin such as in verbs like afwerken ik werk dit af 12 category of the word as labeled by the part of speech tag. Other particles can be singular common nouns in standard ger. case N soort ev stan such as in verbs like paardri. jden ik rij graag paard 13 Yet another set of particles can 4 3 Translating into pictographs. be tagged as adverbially used adjectives ADJ vrij such To each of the synsets that have been attributed to the. as in vrijspreken de rechter sprak hem vrij 14 A final cat words we attach the Sclera pictographs that are linked to. egory of particles are the real adverbs BW such as in bi these synsets We consider different types of linking pic. jeenbrengen hij brengt geld bijeen 15 Each of the words tographs with synsets First we have the pictographs that. of a sentence that is tagged as a verb be it in its finite in are linked to one and only one synset which we call the. finite or past participle form is combined with each of the sclera single pictographs Secondly we have. words tagged with one of the potential particles The most the pictographs that represent a more complex concept than. likely combination according to an 80 million word corpus a single synset and these pictographs have been linked to. is selected merging the verb with its particle This proce two or more synsets For each of these complex synsets we. dure is recursively applied until no further separable verbs consider one of the synsets as the head synset and the rest. are detected We apply the compounding approach of Van as dependents. Linking Pictographs to Synsets Sclera2Cornetto We present a resource in which we have linked a set of 5710 pictographs to lexical semantic concepts in Cornetto

