AfLaT.org - web crawler https://aflat.org/taxonomy/term/210/0 en Corpora for African languages - An Crúbadán https://aflat.org/node/227 <!--paging_filter--><div class="field field-type-link field-field-url"> <div class="field-label">URL:&nbsp;</div> <div class="field-items"> <div class="field-item odd"> <a href="https://crubadan.org/" target="_blank">https://crubadan.org/</a> </div> </div> </div> <div class="field field-type-text field-field-description"> <div class="field-label">Description:&nbsp;</div> <div class="field-items"> <div class="field-item odd"> <!--paging_filter--><p>The Crúbadán Project is devoted to creating basic language technology for minority languages and under-resourced languages using web-crawling and statistical techniques. As of early 2008 we have collected text corpora for 419 languages, including more than 125 African languages, and have used these to create open source spell checkers for more than 20 languages. Please contact Kevin Scannell (https://borel.slu.edu/) if you are interested in developing open source resources for other African languages using these data.</p> </div> </div> </div> https://aflat.org/node/227#comments Central Africa Eastern Africa Northern Africa Southern Africa Western Africa Corpus corpora frequency list web crawler word list Thu, 07 Feb 2008 03:42:21 +0000 scannell 227 at https://aflat.org