English - Luganda Parallel Corpus
A parallel corpus consists of the same text in two or more different languages. Word-alignment involves finding the links between the words in the two texts. A large word-aligned corpus can be used as source material for statistical machine translation techniques and knowledge transfer techniques.
On this page, you can download a small word-aligned parallel corpus Luganda - English. It consists of 150 manually annotated sentences of the gospel of Luke (1:1 until 3:18). The English text is the King James Bible and the Luganda text was taken from the on-line Luganda bible.
Needless to say this is a very modest-size corpus and cannot be used as the only dataset to bootstrap MT research. Its purpose however it to provide a gold-standard test set to evaluate and tune automatic word-alignment techniques for larger parallel corpora English-Luganda.
The files were made using the UMIACS Word Alignment Interface. To visualize the parallel corpus, you will need to download this software. Further data-processing can be done immediately on the output files:
- Luke.tok: English text
- Lukka.tok: Luganda text
- aligned.1 ... aligned.150: a description of the word-alignment for each of the 150 sentences.
The annotation work was done By Edina Nalukenge in the context of the OCAPI project (University of Antwerp).
Attachment | Size |
---|---|
AfLaTpackageLugandaPC.zip | 157.79 KB |
- Login to post comments
Luganda - English parallel corpus
Hullo. I have landed on your site by accident and impressed by this research undertaking in African languages. I do propose that in order to improve on the parallel English-Luganda corpus, the New Vision newspaper is contacted to provide journalistic texts covering the same event in both English (new Vision) and Luganda (Bukedde).
Currently doing a Ph.D in Linguistics at Univ of Poitiers, France. Interested in the interface of Luganda ,English and French. Also interested in joining a research group or interacting with other researchers. Ready to keep in touch . Thanks. E.S.