YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

ultrafeedback_grpo/llama32_3b_armorm_e1_bs128_g4-20260420.234605

  • Policy model: meta-llama/Llama-3.2-3B-Instruct
  • Reward model: RLHFlow/ArmoRM-Llama3-8B-v0.1
  • Dataset: openbmb/UltraFeedback
  • Algorithm: GRPO
  • Epochs: 1
  • Learning rate: 3e-6
  • Warmup ratio: 0.1
  • Scheduler: cosine
  • Global train batch size: 128
  • Group size: 4
  • Max response length: 1024
  • Local checkpoint root: /home/jovyan/kraft/outputs/ultrafeedback_grpo_llama32_3b/runs/ultrafeedback_grpo/llama32_3b_armorm_e1_bs128_g4-20260420.234605/checkpoints
Downloads last month
12
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support