jackf857/qwen3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-q_t-0.45-s_star-0.2-margin Preview • Updated May 6 • 93
jackf857/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p3-margin-log Viewer • Updated May 2 • 681 • 20
jackf857/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.5-margin-log Viewer • Updated May 2 • 681 • 20
jackf857/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.48-margin-log Viewer • Updated May 2 • 681 • 79
jackf857/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.43-margin-log Viewer • Updated May 2 • 681 • 24
jackf857/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.4-margin-log Viewer • Updated May 2 • 681 • 23
jackf857/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.4-eta-4-margin-log Viewer • Updated May 2 • 661 • 37
jackf857/qwen3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-q_t-0.42-s_star-0.45-20260501-114347-margin Preview • Updated May 1 • 55
jackf857/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.4-eta-8-margin-log Viewer • Updated May 1 • 681 • 65
jackf857/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.4-eta-5-margin-log Viewer • Updated May 1 • 681 • 20
jackf857/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.4-eta-1-margin-log Viewer • Updated May 1 • 681 • 23
jackf857/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.5-margin-log Viewer • Updated May 1 • 661 • 22
jackf857/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.4-eta-0.5-margin-log Viewer • Updated May 1 • 681 • 18 • 1
jackf857/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.4-eta-8-margin-log Viewer • Updated May 1 • 661 • 22
jackf857/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.48-margin-log Viewer • Updated May 1 • 661 • 54
jackf857/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.4-eta-0.3-margin-log Viewer • Updated May 1 • 681 • 23
jackf857/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.4-eta-5-margin-log Viewer • Updated May 1 • 661 • 27 • 1
jackf857/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.43-margin-log Viewer • Updated May 1 • 661 • 32
jackf857/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.4-eta-0.05-margin-log Viewer • Updated May 1 • 681 • 25
jackf857/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.4-eta-1-margin-log Viewer • Updated May 1 • 778 • 42
jackf857/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.4-margin-log Viewer • Updated May 1 • 661 • 18
jackf857/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.4-eta-0.01-margin-log Viewer • Updated May 1 • 681 • 19
jackf857/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.4-eta-0.5-margin-log Viewer • Updated May 1 • 661 • 18
jackf857/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.4-eta-0.3-margin-log Viewer • Updated May 1 • 661 • 52
jackf857/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.4-eta-0.05-margin-log Viewer • Updated May 1 • 661 • 23
jackf857/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.4-eta-0.01-margin-log Viewer • Updated May 1 • 661 • 23
jackf857/llama-3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-q_t-0.5-s_star-0.5-20260429-032138-margin Viewer • Updated May 1 • 477 • 23
jackf857/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-eta-0.1-s_star-0.8-margin-log Viewer • Updated May 1 • 681 • 24
jackf857/llama-3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-q_t-0.5-s_star-0.4-20260429-032138-margin Viewer • Updated May 1 • 477 • 22
jackf857/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-eta-0.1-s_star-0.6-margin-log Viewer • Updated May 1 • 681 • 19