Towards Standardizing Evaluation Test Sets for Compound Analysers

Publication TypeConference Paper
Year of Publication2011
AuthorsFourie, Liaan L., Puttkammer Martin, and van Zaanen Menno
BooktitleAGIS11 - Action Week for Global Information Sharing (AfLaT2011 Breakout Session)
LocationAddis Ababa, Ethiopia

Afrikaans is a Germanic language originating from South Africa. As a Germanic language it is an agglutinative language meaning that the language can form very long words that are called compound words. This means that it is impossible to create a dictionary that will contain every possible compound word for the language. Quite a few compound analyzers exist that can handle the problem of compound analysis, but every compound analyzer is evaluated with a different evaluation method. Therefore, not one of these analyzer’s results can be accurately compared with another. The comparison of compound analyzers is the goal, but this article will only discuss the creation of a standard test set. There is no standard test set available for compound analyzers that analyze Afrikaans compound words. It is very important to create such a set due to the elimination of the chance that different analyzers can perform differently due to a difference in test data. In this article it is shown how to create such a standard test set. The set should contain multiple word types, including incorrect examples, so as to cover all evaluations of the compound analyzer. By creating such a standard test set, all different compound analyzers can be evaluated similarly and therefore be compared with one another.