Unsupervised Induction of Dholuo Word Classes using Maximum Entropy Learning

TitleUnsupervised Induction of Dholuo Word Classes using Maximum Entropy Learning
Publication TypeConference Paper
Year of Publication2007
AuthorsDe Pauw, Guy, Wagacha Peter W., and Abade Dorothy A.
BooktitleProceedings of the First International Computer Science and ICT Conference (COSCIT 2007)
PublisherUniversity of Nairobi
LocationNairobi, Kenya
Abstract

This paper describes a proof-of-the-principle experiment in which maximum entropy learning is used for the automatic induction of word classes for the Western Nilotic language of Dholuo. The proposed approach extracts shallow morphological and contextual features for each word of a 300k text corpus of Dholuo. These features provide a layer of linguistic abstraction that enables the extraction of general word classes. We provide a preliminary evaluation of the proposed method in terms of language model perplexity and through a simple case study of the paradigm of the verb stem "somo".

AttachmentSize
coscit.depauw.pdf234.56 KB