You need to agree to share your contact information to access this dataset

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this dataset content.

African Languages Lab Multi-Open

multi-open is the open-source multilingual subset released by the African Languages Lab. It contains English-target parallel text for 31 African languages.

The African Languages Lab: A Collaborative Approach to Advancing Low-Resource African NLP
Issaka et al., ACL 2026.

The paper presents ALL Lab's broader collaborative program: systematic and quality-controlled data infrastructure, empirical evaluation for low-resource African NLP, and local research capacity building. It reports a broader multimodal collection spanning 40 languages and evaluates translation across 31 languages. This repository is not the complete paper corpus. It is a restricted release for the paper's 31 target languages.

Data format

Each configuration is a separate English-target parallel dataset. Text and token-count column names follow the language pair (for example, english, swahili, english_token_count, and swahili_token_count).

Usage

from datasets import load_dataset

dataset = load_dataset("African-Languages-Lab/multi-open", "english-swahili")

Citation

@inproceedings{issaka-etal-2026-african,
  title = "The {A}frican Languages Lab: A Collaborative Approach to Advancing Low-Resource {A}frican {NLP}",
  author = "Issaka, Sheriff and Wang, Keyi and Ajibola, Yinka and Samuel-Ipaye, Oluwatumininu and Zhang, Zhaoyi and Jimenez, Nicte Aguillon and Agyei, Evans Kofi and Lin, Abraham and Ramachandran, Rohan and Mumin, Sadick Abdul and Nchifor, Faith and Issah, Mohammed Shuraim and Gonzalez, Erick Rosas and Liu, Lieqi and Kpei, Sylvester and Osei, Jemimah Kusi and Ajeneza, Carlene and Boateng, Persis and Yeboah, Prisca Adwoa Dufie and Gabriel, Saadia",
  booktitle = "Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
  month = jul,
  year = "2026",
  publisher = "Association for Computational Linguistics",
  url = "https://aclanthology.org/2026.acl-long.1965/",
  pages = "42460--42477"
}
Downloads last month
8