Shona Text Corpus


Provides access to a text corpus for Shona using a web interface. Provided by the ALLEX Project.

A Spellchecker for Afrikaans, Based on Morphological Analysis

A Spellchecker for Afrikaans, Based on Morphological Analysis, van Huyssteen, G. B., and van Zaanen Menno , Proceedings of the 6th International Terminology in Advanced Management Applications Conference (TAMA2003), p.189-194, (2003)

Learning Compound Boundaries for Afrikaans Spelling Checking

Learning Compound Boundaries for Afrikaans Spelling Checking, van Huyssteen, G. B., and van Zaanen Menno , Pre-Proceedings of the Workshop on International Proofing Tools and Language Technologies, July, (2004)

CNTS - Language Technology Group


CNTS is a research center of the Department of Linguistics of the University of Antwerp (UA) in Antwerp, Belgium, engaged in research in computational linguistics and psycholinguistics. The CNTS - Language Technology Group has a strong tradition in the application of machine learning techniques for natural language processing. Recently, CNTS has also started investigating the applicability of unsupervised learning methods and knowledge transfer techniques for the annotation and linguistic description of African languages, particularly Kiswahili and the local languages of Kenya.

Kiswahili Part-of-Speech Tagger - Demo

This demo showcases a broad coverage part-of-speech tagger for Kiswahili. It retrieves the morpho-syntactic categories for words in a sentence. This system uses the Memory-Based Tagger trained on the Helsinki Corpus of Swahili.

Example: Hapo ni kwa nini Sahara halina maji na kwa nini simba na shungi.

[Tagging the text might take a while]


Guy De Pauw: CNTS - Language Technology Group, University of Antwerp, Antwerp, Belgium, guy [dot] depauw [at] ua [dot] ac [dot] be
Gilles-Maurice de Schryver: African Languages and Cultures, Ghent University, Ghent, Belgium, gillesmaurice [dot] deschryver [at] ugent [dot] be
Peter Waiganjo Wagacha: School of Computing and Informatics, University of Nairobi, Nairobi, Kenya, waiganjo [at] uonbi [dot] ac [dot] ke


Gĩkũyũ Diacritic Placement - Demo

The orthography of Gĩkũyũ includes a number of accented characters to represent the entire vowel system (namely ĩ and ũ). Not available on standard computer keyboards, these characters are usually typed as the nearest available characters (i and u).

