jackf857/qwen3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-q_t-0.45-s_star-0.2 Text Generation • 8B • Updated 29 days ago • 40
jackf857/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p3 Text Generation • 8B • Updated May 2 • 5
jackf857/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.5 Text Generation • 8B • Updated May 2 • 6
jackf857/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.48 Text Generation • 8B • Updated May 2 • 6
jackf857/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.43 Text Generation • 8B • Updated May 2 • 5
jackf857/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.4 Text Generation • 8B • Updated May 2 • 4
jackf857/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p05 Text Generation • 8B • Updated May 2 • 5
jackf857/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.4-eta-4 Text Generation • 8B • Updated May 2 • 6
jackf857/qwen3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-q_t-0.42-s_star-0.45-20260501-114347 Text Generation • 8B • Updated May 1 • 3
jackf857/qwen3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-q_t-0.45-s_star-0.2-margin Preview • Updated 29 days ago • 104
jackf857/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p3-margin-log Viewer • Updated May 2 • 681 • 21
jackf857/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.5-margin-log Viewer • Updated May 2 • 681 • 21
jackf857/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.48-margin-log Viewer • Updated May 2 • 681 • 66
jackf857/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.43-margin-log Viewer • Updated May 2 • 681 • 25
jackf857/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.4-margin-log Viewer • Updated May 2 • 681 • 24
jackf857/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.4-eta-4-margin-log Viewer • Updated May 2 • 661 • 38
jackf857/qwen3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-q_t-0.42-s_star-0.45-20260501-114347-margin Preview • Updated May 1 • 55
jackf857/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.4-eta-8-margin-log Viewer • Updated May 1 • 681 • 66
jackf857/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.4-eta-5-margin-log Viewer • Updated May 1 • 681 • 21