Arabic Dialect Identifcation
In this task, we classify Arabic speech into five dialects:
- Egyptian Arabic (EGY) covers the dialects of the Nile valley: Egypt and Sudan.
- Levantine Arabic (LAV) includes the dialects of Lebanon, Syria, Jordan and Palestine.
- Gulf Arabic (GLF) includes the dialects of Kuwait, the United Arab Emirates, Bahrain, and Qatar. Saudi Arabia is typically included, although there is a wide range of sub-dialects within it. Omani Arabic is sometimes included as well.
- North African Arabic (NOR) - also known as Maghrebi - covers the dialects of Morocco, Algeria, Tunisia, and Mauritania. Libyan Arabic is sometimes included too.
- Modern Standard Arabic (MSA), which constitutes formal speech.
Lexical variation across the five dialects
- Use grapheme-based ASR
- Explore word vector space modeling
- Explore character space modeling - including unknown words (OOV symbol)
*S. Khurana, M. Najafian, A. Ali, T. Al Hanai, Y. Belinkov, J. Glass, “QMDIS: QCRI-MIT Advanced Dialect Identification System”, InterSpeech 2017.