Development of a corpus for Gĩkũyũ using machine learning techniques

TitleDevelopment of a corpus for Gĩkũyũ using machine learning techniques
Publication TypeConference Paper
Year of Publication2006
AuthorsWagacha, Peter W., De Pauw Guy, and Getao K.
BooktitleProceedings of LREC workshop - Networking the development of language resources for African languages
LocationGenoa, Italy
EditorRoux, J C

Networking the development of computational resources for African languages can be greatly advanced if researchers aim to develop tools that are to a large extent language-independent and therefore reusable for other languages. In this paper we describe a particular case study, namely the development of an annotated corpus of Gĩkũyũ, using language-independent machine learning techniques. The general aim of our work on Gĩkũyũ is two-fold: on the one hand we wish to digitally preserve this resource-scarce language, while on the other hand it serves as a feasibility study of using language-independent machine learning techniques for linguistic annotation of corpora. To this end we investigate established annotation induction techniques like unsupervised learning and knowledge transfer. These methods can provide interesting perspectives for the linguistic description of many other resource-scarce languages.

6_depauw.pdf76.25 KB