Bilingual LMs ( L1 {es fr de pl tr ar zh} + L2 en ) trained on Cultura-X for L1 and FineWebEdu (L2)
Suchir Salhan
suchirsalhan
AI & ML interests
Multilinguality and Cognitively-Inspired AI. Tokenization, Pretraining, Interpretability & Alignment.
Recent Activity
updated a dataset about 9 hours ago
Beetle-Data/ja-raw-28B updated a dataset about 11 hours ago
Beetle-Data/ar-raw-28B updated a dataset about 15 hours ago
MultilingualUnigramLM/FineWeb2-tur_Latn-100M