POSTS
PORTFOLIO
SEARCH
Your address will show here +12 34 56 78
Arabic Dialect Identifcation

In this task, we classify Arabic speech into five dialects:
  • Egyptian Arabic (EGY) covers the dialects of the Nile valley: Egypt and Sudan.
  • Levantine Arabic (LAV) includes the dialects of Lebanon, Syria, Jordan and Palestine.
  • Gulf Arabic (GLF) includes the dialects of Kuwait, the United Arab Emirates, Bahrain, and Qatar. Saudi Arabia is typically included, although there is a wide range of sub-dialects within it. Omani Arabic is sometimes included as well.
  • North African Arabic (NOR) - also known as Maghrebi - covers the dialects of Morocco, Algeria, Tunisia, and Mauritania. Libyan Arabic is sometimes included too.
  • Modern Standard Arabic (MSA), which constitutes formal speech.
 
Lexical variation across the five dialects

  • Use grapheme-based ASR
  • Explore word vector space modeling
  • Explore character space modeling - including unknown words (OOV symbol) 

  • Text Hover
*S. Khurana, M. Najafian, A. Ali, T. Al Hanai, Y. Belinkov, J. Glass, “QMDIS: QCRI-MIT Advanced Dialect Identification System”, InterSpeech 2017.


The Arabic Dialect Identification (ADI) classification assumes that each speech segment corresponds to one dialect.
There are three editions of the ADI challenge data: