The Official Somali Corpus


Dr. Jama Musse Jama has made a 3 million words tagged corpus available, as well as a 10 million words web crawled corpus. The data is available from

Call For Papers: Sub-Saharan African languages: from speech fundamentals to applications

Interspeech 2016 Special Session: Sub-Saharan African languages: from speech fundamentals to applications

This special session aims at gathering researchers in speech technology and researchers in linguistics (working in language documentation and fundamentals of speech science). Such a partnership is particularly important for Sub-Saharan African languages which tend to remain under-resourced, under-documented and often also un-written.

Prospective authors are invited to submit original papers in the following areas:

  • ASR and TTS for Sub-Saharan African languages and dialects
  • Cross-lingual and multi-lingual acoustic and lexical modeling
  • Applications of spoken language technologies for the African continent
  • Phonetic and linguistic studies in Sub-Saharan African languages
  • Zero resource speech technologies: unsupervised discovery of linguistic units
  • Language documentation for endangered languages of Africa
  • Machine-assisted annotation of speech and laboratory phonology
  • Resource / Corpora production in African languages

Submission deadline
Same as regular Interspeech 2016 papers: 23rd March, 2016

Special session web site
For more details on this special session:

Organizing Committee
Martine Adda-Decker (madda [at] limsi [dot] fr) – CNRS – LPP and LIMSI, France.
Laurent Besacier (laurent [dot] besacier [at] imag [dot] fr) - Univ. Grenoble-Alpes, France - LIG laboratory.
Marelie Davel (marelie [dot] davel [at] nwu [dot] ac [dot] za) – North-West University, Vanderbijlpark, South Africa.
Larry Hyman (hyman [at] berkeley [dot] edu) - Department of Linguistics, University of California, Berkeley.
Martin Jansche (mjansche [at] google [dot] com) – Google, London, UK.
Francois Pellegrino (francois [dot] pellegrino [at] univ-lyon2 [dot] fr) – CNRS – DDL Lyon, France.
Olivier Rosec (olivier [dot] rosec [at] voxygen [dot] fr) – Voxygen SAS,- Pleumeur-Bodou, France.
Sebastian Stüker (sebastian [dot] stueker [at] kit [dot] edu) - Karlsruhe Institute of Technology (KIT), Germany.
Martha Tachbelie Yifiru (martha [dot] yifiru [at] aau [dot] edu [dot] et) – School of Information Science, Addis Ababa University, Ethiopia.

AfLaT 2013 - Report

The 2013 edition of the AfLaT workshop series took place on Friday 6 December 2013, at Ghent University. It was the fifth in the series, and conceived differently from previous editions, in that we wanted to broaden our activities by reaching out to all colleagues who have lexical resources for African languages, and are already working with those resources, but have not yet necessarily made the move to using advanced computational routines to speed up the analysis or the building of tools.

And so AfLaT 5 was conceived as a MasterClass, led by the founding members of AfLaT: Guy De Pauw (U Antwerp), Gilles-Maurice de Schryver (U Ghent), and Peter Wagacha (U Nairobi). Researchers were invited to present their current data sets and/or research during max. 20minutes, to be followed by a discussion and advice from those present for 10 min.

On the following pages, you will find some impressions of the workshop. The full book of abstracts can be found here.

Activating your corpus - Guy De Pauw

Job-opening: part-time teaching assistent Swahili @Ugent

Ghent University is looking for a part-time teaching assistent Swahili. More information here:


TALAf 2016

TALAf 2016 : Traitement automatique des langues africaines (text and speech)
JEP-TALN-RECITAL 2016 Workshop - Paris 4 July 2016


TALAf workshops take place every two years. The first workshop was held during the JEP-TALN-RÉCITAL 2012 conference on June 8, 2012 in Grenoble (see proceedings: The second one took place during the TALN 2014 conference on July 1, 2014 in Marseille (see proceedings:

The third edition of TALAf will be held during the JEP-TALN-RECITAL conference, on July 4, 2016 at INALCO in Paris.

Natural language processing is booming in Africa. Indeed, in many countries, there is an ongoing official recognition of national languages, for instance:
• In Niger, laws defining alphabets for Hausa, Kanuri, and Tamajaq Zarma were published in 1999. Since then, the National Assembly has set up simultaneous translation of the debates in three languages: French, Hausa and Zarma;
• In Morocco, the Royal Institute of Amazigh Culture (IRCAM), which works for the promotion of Amazigh culture and development of the Berber language was founded by royal decree in 2001;
• In Senegal, the recognition of national languages of the recognition was mentioned in the first article of the Constitution of 7 January 2001: "The official language of the Republic of Senegal is French. The national languages are Diola, Malinke, Pular, Serer, Soninke, Wolof and other national language to be codified." The Department of Technical Education, Vocational Training, Literacy and National Languages (METFPALN) is responsible for this. Since December 9, 2014, the Senegalese parliamentarians debates are translated simultaneously through an interpretation system in six national languages (Fulani, Serere, Wolof, Jola, Mandinka and Soninke) in addition to French, allowing the majority of members to speak in their mother tongue.

Moreover, a number of colleagues / African scholars trained in the North return to their country with the will to continue their work in local languages. There are also some Diasporas that have technological material allowing them to contribute directly online and on a voluntary basis.

Added to this, the development of bilingual education programs (official / national language) in primary schools in many countries is growing. The official language remaining mostly that of the former colonial country (French, English, Portuguese ...).

On the other hand, mobile phones are spreading fast: with 650 million units, Africa has surpassed the United States and Europe. In many areas, it is easier to install a mobile antenna than fixed lines. Therefore, the people who use a telephone for the first time do it with a mobile terminal. Applications are developed such as money transfer or dissemination of weather reports.

The funding of research projects on these languages can now be obtained from the "Organisation Internationale de la Francophonie" with their calls for projects of the "fonds francophone des inforoutes" (see eg DiLAF or flore projects) or the "Agence Universitaire de la Francophonie". France also supports projects on these languages through the National Agency for Research (see eg ALFFA project).

So the conditions are gathered for the development of natural language processing in Africa, both written and spoken.

In this context, the roles of TALAf workshop are:
• bring together researchers in the field through meetings at the workshop but also with the talaf [at] imag [dot] fr mailing list;
• pooling knowledge using open source tools, standards (ISO, Unicode), and publishing the resources produced with an open license (Creative Commons) to avoid including the loss of information when a project stops and can not be resumed immediately for lack of resources;
• develop a set of best practices based on the experience of researchers ; set up simple efficient methodologies based on free or very cheap software for the development of resources, exchange about techniques that can avoid the use of non-existent resources and finally avoid loss of time and energy.

TALAf workshops are supported by the non-profit organisation "Lexicologie Terminologie Traduction":

We invite all researchers in natural language processing working on African languages, including Creole languages of Africa, whether written or oral, to submit a paper to this workshop.

Publications should contain between 6 and 12 pages. Authors are invited to submit papers presenting original research particularly on the themes listed below.

French speaking authors are invited to write in French with a summary in the language of theirchoice. Non-French speaking writers can write in English with a summary in French and another in the language of their choice.

The workshop is open to research works on the following topics:

• written corpora (monolingual, bilingual aligned or comparable)
• speech corpora (including transcription)
• lexicons, dictionaries and databases (monolingual, bilingual,multilingual)
• resources enrichment
• resources quality evaluation
• morphological analyzers, spell-checkers
• syntactic analyzers, grammar checkers
• machine translation systems (empirical or rule-based)
• speech recognition
• text-to-speech synthesis
• translitteration

Submissions will be reviewed by at least two specialists of the domain.

The following points will be taken into account:
• adequacy to the workshop topics
• importance and originality of the contribution,
• scientific and technical content precision,
• organization and clarity of the presentation.

The submission formats will be available for OpenOffice, Word and Latex and accessible from:
The communication proposals must be sent in PDF format to the following address: soumission

Martine Adda-Decker (CNRS-LPP & LIMSI, Paris, France)
Laurent Besacier (LIG, Grenoble, France)
Sokhna Bao Diop (Université Gaston Berger, St Louis du Sénégal, Sénégal)
Philippe Bretier (Voxygen, Pleumeur-Bodou, France)
Khalid Choukri (ELDA, Paris, France)
Mame Thierno Cissé (ARCIV, Université Cheikh Anta Diop, Dakar, Sénégal)
Chantal Enguehard (LINA, Nantes, France)
Núria Gala (LIF, Marseille, France)
Modi Issouf (Ministère de l'Éducation, Niamey, Niger)
Fary Silate Ka (IFAN, Université Cheikh Anta Diop, Dakar, Sénégal)
Mathieu Mangeot (LIG, Grenoble, France)
Chérif Mbodj, (Centre de Linguistique Appliquée de Dakar, Sénégal)
Kamal Naït-Zerrad (INALCO, Paris, France)
El Hadj Mamadou Nguer (Université Gaston Berger, St Louis du Sénégal, Sénégal)
Donald Osborn (Bisharat, ltd.)
Francois Pellegrino, (DDL, Lyon, France)
Olivier Rosec (Voxygen, Pleumeur-Bodou, France)
Fatiha Sadat (UQAM, Montréal, Canada)
Aliou Ngoné Seck (FLSH, Université Cheikh Anta Diop, Dakar, Sénégal)
Emmanuel Schang (Université d'Orléans, Orléans, France)
Gilles Sérasset (LIG, Grenoble, France)
Max Silberztein (ELLIADD, Université de Franche-Comté, Besançon, France)
Sylvie Voisin (DDL, Lyon, France)
Valentin Vydrin (LLACAN-INALCO, Paris, France)

• Submission deadline: 24 April 2016
• Notification of acceptance: 11 May 2016
• Final Submission Deadline: 1 June 2016
• Workshop: 4 July 2016

