This paper describes a proof-of-the-principle experiment in which maximum entropy learning is used for the automatic induction of word classes for the Western Nilotic language of Dholuo. The proposed approach extracts shallow morphological and contextual features for each word of a 300k text corpus of Dholuo. These features provide a layer of linguistic abstraction that enables the extraction of general word classes. We provide a preliminary evaluation of the proposed method in terms of language model perplexity and through a simple case study of the paradigm of the verb stem "somo".
|