starcoder uses Megatron-LM?

#27

by senxiangms - opened May 10, 2023

Discussion

senxiangms

May 10, 2023

Software Orchestration: Megatron-LM

But from code like class GPTBigCodeBlock doesn't use megatron.

anything wrong?

SivilTaram

BigCode org May 11, 2023

@senxiangms checkout the pre-training code at https://github.com/bigcode-project/Megatron-LM.

SivilTaram changed discussion status to closed May 11, 2023

senxiangms

May 11, 2023

ic. appreciated.

senxiangms

May 11, 2023

@senxiangms checkout the pre-training code at https://github.com/bigcode-project/Megatron-LM.

should I look at Megatron-LM/examples/pretrain_bigcode_model.slurm as entry point?

Thanks.

loubnabnl

BigCode org May 12, 2023

The pre-training was done in Megatron-LM and then we converted the checkpoints to transformers which uses GPTBigCodeBlock ... If you're looking for the code to train the model in Megatron-LM it's there and the slurm script to launch the job is indeed Megatron-LM/examples/pretrain_bigcode_model.slurm but it's specific to our cluster otherwise you can just use transformers