Call for Participation

Language Technologies for African Languages
March 31 2009
Athens, Greece

A workshop at the annual meeting of the European Association for Computational Linguistics

In multilingual situations, language technologies are crucial for providing access to information and opportunities for economic development. With somewhere between 1,000 and 2,000 different languages, Africa is a multilingual continent par excellence and presents acute challenges for those seeking to promote and use African languages in the areas of business development, education, research, and relief aid. In recent times a number of African researchers and institutions have come forward that share the common goal of developing capabilities in language technologies. This workshop provides a forum to meet and share the latest developments in this field.
It also seeks to include linguists who specialize in African languages and would like to leverage the tools and approaches of computational linguistics, as well as computational linguists who are interested in learning about the particular linguistic challenges posed by African languages.
The workshop will consist of an invited tutorial on African language families and their structural properties by Prof. Sonja Bosch (UNISA, South Africa), followed by refereed research papers in computational linguistics. The focus will be on the less-commonly studied lesser-resourced languages, such as those of sub-Saharan Africa. These include languages from all four families, Niger-Congo, Nilo-Saharan, Khoisan and Afro-Asiatic with the exception of Arabic which is covered by the SIGSemitic workshops. The workshop will also not cover variants of European languages such as African French, African English or Afrikaans.

Tuesday, March 31, 2009

09:00–10:30 Invited Talk: "African Language Families and their Structural Properties" by Sonja Bosch
10:30–11:00 Coffee Break
  Session 1: Corpora (11:00–12:30)
11:00–11:30 Collecting and Evaluating Speech Recognition Corpora for Nine Southern Bantu Languages
by Jaco Badenhorst, Charl Van Heerden, Marelie Davel and Etienne Barnard
11:30–12:00 The SAWA Corpus: A Parallel Corpus English - Swahili
Guy De Pauw, Peter Waiganjo Wagacha and Gilles-Maurice de Schryver
12:00–12:30 Information Structure in African Languages: Corpora and Tools
Christian Chiarcos, Ines Fiedler, Mira Grubic, Andreas Haida, Katharina Hartmann, Julia Ritz, Anne Schwarz, Amir Zeldes and Malte Zimmermann
12:30–14:00 Lunch Break
  Session 2: Morphology, Speech and Part-of-Speech Tagging (14:00–16:00)
14:00–14:30 A Computational Approach to Yorùbá Morphology
Raphael Finkel and Odetunji Ajadi Odejobi
14:30–15:00 Using Technology Transfer to Advance Automatic Lemmatisation for Setswana
Hendrik Johannes Groenewald
15:00–15:30 Part-of-Speech Tagging of Northern Sotho: Disambiguating Polysemous Function Words
Gertrud Faaß, Ulrich Heid, Elsabé Taljard and Danie Prinsloo
15:30–16:00 Development of an Amharic Text-to-Speech System Using Cepstral Method
Tadesse Anberbir and Tomio Takara
16:00–16:30 Coffee Break
  Session 3: General Papers and Discussion (16:30–18:00)
16:30–17:00 Building Capacities in Human Language Technology for African Languages
Tunde Adegbola
17:00–17:30 Initial Fieldwork for LWAZI: A Telephone-Based Spoken Dialog System for Rural South Africa
Tebogo Gumede and Madelaine Plauché
17:30–18:00 Discussion
  Poster Session (During Coffee and Lunch Breaks)
  Setswana Tokenisation and Computational Verb Morphology: Facing the Challenge of a Disjunctive Orthography
Rigardt Pretorius, Ansu Berg, Laurette Pretorius and Biffie Viljoen
  Interlinear Glossing and its Role in Theoretical and Descriptive Studies of African and other Lesser–Documented Languages
Dorothee Beermann and Pavel Mihaylov
  Towards an Electronic Dictionary of Tamajaq Language in Niger
Chantal Enguehard and Issouf Modi
  A Repository of Free Lexical Resources for African Languages: The Project and the Method
Piotr Bański and Beata Wójtowicz
  Exploiting Cross-Linguistic Similarities in Zulu and Xhosa Computational Morphology
Laurette Pretorius and Sonja Bosch
  Methods for Amharic Part-of-Speech Tagging
Björn Gambäck, Fredrik Olsson, Atelach Alemu Argaw and Lars Asker
  An Ontology for Accessing Transcription Systems (OATS)
Steven Moran

Lori Levin: Language Technologies Institute, Carnegie Mellon University, USA (Workshop Chair)
John Kiango: Director, Institute of Kiswahili Research, University of Dar Es Salaam, Tanzania
Judith Klavans: University of Maryland, Institute for Advanced Computer Studies, USA
Manuela Noske: Microsoft Corporation, Redmond, USA
Guy De Pauw: University of Antwerp, Belgium | University of Nairobi, Kenya |
Gilles-Maurice de Schryver: African Languages and Cultures, Ghent University, Belgium | University of the Western Cape, South Africa |
Peter Waiganjo Wagacha: School of Computing and Informatics, University of Nairobi, Kenya |

Akinbiyi Akinlabi, Rutgers University
Yiwola Awoyale, University of Pennsylvania, Linguistic Data Consortium
Moussa Bamba, University of Pennsylvania, Linguistic Data Consortium
Alan Black, Carnegie Mellon University
Sonja Bosch, University of South Africa
Christopher Cieri, University of Pennsylvania, Linguistic Data Consortium
Robert Frederking, Carnegie Mellon University
Dafydd Gibbon, University of Bielefeld, Germany
Jeff Good, SUNY Buffalo
Mike Gasser, Indiana University
Gregory Iverson, University of Maryland, Center for Advanced Study of Language
Stephen Larocca, US Army Research Lab
Michael Maxwell, University of Maryland, Center for Advanced Study of Language
Jonathan Owens, University of Maryland, Center for Advanced Study of Language
Tristan Purvis, University of Maryland, Center for Advanced Study of Language
Antonia Schleicher, University of Wisconsin at Madison
Tanja Schultz, Karlsruhe University
Clare Voss, US Army Research Lab
Briony Williams, University of Wales, Bangor

Lori Levin
Language Technologies Institute
Newell-Simon Hall, Carnegie Mellon University
Pittsburgh, PA 15213
lsl cs cmu edu