AfLaT.org - annotation https://aflat.org/taxonomy/term/106/0 en Helsinki Corpus of Swahili https://aflat.org/node/97 <!--paging_filter--><div class="field field-type-link field-field-url"> <div class="field-label">URL:&nbsp;</div> <div class="field-items"> <div class="field-item odd"> <a href="https://www.csc.fi/english/research/software/hcs" target="_blank">https://www.csc.fi/english/research/software/hcs</a> </div> </div> </div> <div class="field field-type-text field-field-description"> <div class="field-label">Description:&nbsp;</div> <div class="field-items"> <div class="field-item odd"> <!--paging_filter--><p>Helsinki Corpus of Swahili contains 12,5 million words of text from a number of current news sources as well as extracts from a large number of books. Typing errors of texts have been manually corrected. The corpus was tagged with SALAMA without human intervention. With a signed contract the corpus is available for scientific research without charge.<br /> The corpus can be accessed through the web-based browser Lemmie 2.0. A direct access to the Linux server is also possible. Currently it is not possible to access the English glosses with Lemmie 2.0. So the users needing the English glosses might wish to use the Linux interface.<br /> Currently HCS does not have syntactic tags. In future we wish to enrich the corpus with those tags, together with a number of new features, including a large number of idioms and multi-word expressions. Also new texts will be added.</p> </div> </div> </div> https://aflat.org/node/97#comments Eastern Africa Corpus annotation Swahili text corpus Fri, 26 Jan 2007 07:28:54 +0000 ahurskai 97 at https://aflat.org