English - Luganda Parallel Corpus

A parallel corpus consists of the same text in two or more different languages. Word-alignment involves finding the links between the words in the two texts.  A large word-aligned corpus can be used as source material for statistical machine translation techniques and knowledge transfer techniques.

On this page, you can download a small word-aligned parallel corpus Luganda - English. It consists of 150 manually annotated sentences of the gospel of Luke (1:1 until 3:18). The English text is the King James Bible and the Luganda text was taken from the on-line Luganda bible.

Needless to say this is a very modest-size corpus and cannot be used as the only dataset to bootstrap MT research. Its purpose however it to provide a gold-standard test set to evaluate and tune automatic word-alignment techniques for larger parallel corpora English-Luganda.
The files were made using the UMIACS Word Alignment Interface. To visualize the parallel corpus, you will need to download this software. Further data-processing can be done immediately on the output files:

  • Luke.tok: English text
  • Lukka.tok: Luganda text
  • aligned.1 ... aligned.150: a description of the word-alignment for each of the 150 sentences.

The annotation work was done By Edina Nalukenge in the context of the OCAPI project (University of Antwerp).

AttachmentSize
AfLaTpackageLugandaPC.zip157.79 KB

Luganda - English parallel corpus

Hullo. I have landed on your site by accident and impressed by this research undertaking in African languages. I do propose that in order to improve on the parallel English-Luganda corpus, the New Vision newspaper is contacted to provide journalistic texts covering the same event in both English (new Vision) and Luganda (Bukedde).
Currently doing a Ph.D in Linguistics at Univ of Poitiers, France. Interested in the interface of Luganda ,English and French. Also interested in joining a research group or interacting with other researchers. Ready to keep in touch . Thanks. E.S.