Application and testing of performance enhancing morphological analysis techniques
Title | Application and testing of performance enhancing morphological analysis techniques |
Publication Type | Conference Paper |
Year of Publication | 2007 |
Authors | Anderson, Winston, Kotzé Petronella M., and Kotzé Albert E. |
Booktitle | LSSA/SAALA/SAALT Joint Annual Conference |
Date | 4-6 July 2007 |
Location | North-West University, Potchefstroom, South Africa |
Abstract | The researchers are members of a team that is building a morphological analyser for Sesotho sa Leboa (Northern Sotho) as part of the African Language Association of Southern Africa's Language and Speech Development Technology Project, which is registered with the NRF. Using finite-state lexical transducer software, various projects have been undertaken over the last four years in morphological analysis and generation. During this time, the lack of a sufficient corpus of electronic roots hampered the performance of the morphological analyser/generator. Owing to the shortage of roots, the recognition of words in texts was dismal. Initial analyses of large texts was less than 30% successful in terms of word analysis. Using techniques applied by sister teams within the same project increased the analysis success rate to at least 60%. This did not require the inclusion of significant numbers of open class words. The techniques are discussed in detail, as well as problems that arose subsequently. New areas of research that arose from some of these techniques are also identified. The techniques include, amongst others: completing the analysis of all closed classes of words; in particular, isolating auxiliary verbs as a "closed class" and completing their analysis; choosing a suitable short text and aiming for 100% analysis accuracy on that text, and subsequently extend the lexicographical information. It is shown that such seemingly superficial techniques produce a dramatic performance increase in the amount of words that can be analysed in any given text. |
- Login to post comments
- Google Scholar