web crawler
Corpora for African languages - An Crúbadán
Submitted by scannell on Thu, 2008-02-07 04:42Description:
The Crúbadán Project is devoted to creating basic language technology for minority languages and under-resourced languages using web-crawling and statistical techniques. As of early 2008 we have collected text corpora for 419 languages, including more than 125 African languages, and have used these to create open source spell checkers for more than 20 languages. Please contact Kevin Scannell (https://borel.slu.edu/) if you are interested in developing open source resources for other African languages using these data.
»
- Login to post comments