Resource Development for South African Bantu Languages: Computational Morphological Analysers and Machine-Readable Lexicons

TitleResource Development for South African Bantu Languages: Computational Morphological Analysers and Machine-Readable Lexicons
Publication TypeConference Paper
Year of Publication2006
AuthorsBosch, Sonja E., Jones Jackie, Pretorius Laurette, and Anderson Winston
BooktitleProceedings on the Workshop on Networking the Development of Language Resources for African Languages 5th International Conference on Language Resources and Evaluation
Date22 May 2006
LocationGenoa, Italy
Abstract

The development of computational morphological analysers for South African Bantu languages is linked to a project funded by the National Research Foundation in South Africa. The main research question in the project concerns the development of finite-state morphological analysers for five Bantu languages, namely Zulu, Xhosa and Swati (belonging to the Nguni group of languages), and Northern Sotho and Tswana (belonging to the Sotho group of languages). This development is based on underlying machine-readable lexicons that conform to common lexical specifications and international standards. Due to the rich agglutinating morphological structures of these languages, the morphological processing poses particular challenges. These challenges are of an orthographical, a morphological as well as of a lexical nature. The current status of the project is reported on, firstly in terms of the development of prototypes of morphological analysers for the various languages, and secondly in terms of the development of standardised XML machine-readable lexicons for the South African Bantu languages, based on an appropriate general data model.