The Centre for Text Technology (CTexT) at the North-West University (South Africa) is developing proprietary spelling checkers for various African languages. In order to conduct this project successfully, we are currently sourcing various resources, most notably electronic resources (word lists, corpora, etc.) and people (experts and assistants) to contribute to the project.


Microsoft’s Local Language Program is a global initiative to provide desktop software and tools to their customers by collaborating with local experts (governments, universities and other interested parties) to help build a robust local IT economy to:

- Help bridge the language and digital divide between developed and emerging markets.
- Help preserve language and culture. Help technology impact language and culture in a positive way.
- Help maintain the connections between communities.

Proofing tools such as spelling checkers and grammar checkers are important human language technology resources that enable speakers of the language to preserve and promote their language and culture while benefiting from Information Technology advancements. In this project, proprietary spelling checkers for various languages will be developed in cooperation with expert communities to ensure that the local languages are well defined and represented. Hence, the Centre for Text Technology (CTexT) at the North-West University (South Africa) is looking for co-workers to assist in the development of lexical data to be used in spelling checkers for:

- Hausa,
- Igbo,
- Kinyarwanda,
- Wolof, and
- Yoruba.

Assisting in this project will help promote communication and interaction in these languages.


We have the following needs:

Electronic Resources
1. Common and specialist word lists (such as lists of common spelling mistakes, lists of abbreviations, phonetic similarities, repetitive words, hyphen words, etc.).
2. Corpora, dictionaries, and books in electronic format.
3. A balanced corpus of 30,000 words (for testing purposes).
4. Rules for morphologically productive word formation processes, plus word lists to which these rules apply.

1. Linguists and/or language practitioners who can assist in the quality control of word lists.
2. Linguists who can assist in the compilation and/or refinement of morphological rules and rules for tokenisation.
3. Linguists who can provide a description of the standard written variant of the languages, as well as an annotated paragraph of 500-1000 words.

Should you have access to resources in respect of this project: kindly submit a brief description of what you can provide us with, as well as an indication of the conditions under which you would be prepared to make these available to us.

If you are a linguist or language practitioner interested in working on the project: kindly submit a description or shortened CV to highlight your relevant expertise and/or experience. Please note that linguists looking to become co-workers should comply with the following prerequisites:
1. Be computer literate and have regular access to email.
2. Have expert knowledge of the standard written variant of the language(s) they intend to work on.
3. Be able to commence work in February 2007.
4. Be able and willing to travel to South Africa for training, if needed.

Please send information per email only to:

Martin Puttkammer
Programme Manager: Proofing Tools
Centre for Text Technology (CTexT), North-West University, South Africa
Martin [dot] Puttkammer [at] nwu [dot] ac [dot] za
+27 18 299 1512

Kindly forward this message to other colleagues who might be interested in this project.

We also welcome comments and suggestions regarding this project.

CTexT reserves the right to accept or refuse any offers pertaining to this announcement at its sole discretion.

This message (and attachments) is subject to restrictions and a disclaimer. Please refer to http://www.puk.ac.za/itb/e-pos/disclaimer.html for full details, or at itbsekr [at] puknet [dot] puk [dot] ac [dot] za