English - Luganda Parallel Corpus
Submitted by Guy on Thu, 2007-01-18 13:31A parallel corpus consists of the same text in two or more different languages. Word-alignment involves finding the links between the words in the two texts. A large word-aligned corpus can be used as source material for statistical machine translation techniques and knowledge transfer techniques.
On this page, you can download a small word-aligned parallel corpus Luganda - English. It consists of 150 manually annotated sentences of the gospel of Luke (1:1 until 3:18). The English text is the King James Bible and the Luganda text was taken from the on-line Luganda bible.
Needless to say this is a very modest-size corpus and cannot be used as the only dataset to bootstrap MT research. Its purpose however it to provide a gold-standard test set to evaluate and tune automatic word-alignment techniques for larger parallel corpora English-Luganda.
The files were made using the UMIACS Word Alignment Interface. To visualize the parallel corpus, you will need to download this software. Further data-processing can be done immediately on the output files:
- Luke.tok: English text
- Lukka.tok: Luganda text
- aligned.1 ... aligned.150: a description of the word-alignment for each of the 150 sentences.
The annotation work was done By Edina Nalukenge in the context of the OCAPI project (University of Antwerp).
Luganda Dictionary
Submitted by Guy on Mon, 2006-12-18 13:10An on-line English to Luganda translation dictionary
- Login to post comments