Towards machine-readable lexicons for South African Bantu languages

Publication TypeJournal Article
Year of Publication2007
AuthorsBosch, Sonja E., Pretorius Laurette, and Jones Jackie
Journal TitleNordic Journal of African Studies

Lexical information for South African Bantu languages is not readily available in the form of
machine-readable lexicons. At present the availability of lexical information is restricted to a
variety of paper dictionaries. These dictionaries display considerable diversity in the
organisation and representation of data. In order to proceed towards the development of
reusable and suitably standardised machine-readable lexicons for these languages, a data
model for lexical entries becomes a prerequisite. In this study the general purpose model as
developed by Bell and Bird (2000) is used as a point of departure.

Firstly, the extent to which the Bell and Bird (2000) data model may be applied to and
modified for the above-mentioned languages is investigated. Initial investigations indicate
that modification of this data model is necessary to make provision for the specific
requirements of lexical entries in these languages. Secondly, a data model in the form of an
XML DTD for the languages in question, based on our findings regarding Bell and Bird
(2000) and Weber (2002) is presented. Included in this model are additional particular
requirements for complete and appropriate representation of linguistic information as
identified in the study of available paper dictionaries.