geodesic-research/sfm-sft_dolci_mcqa_claude_instruct_unfiltered_synth_misalign_mid-DPO Updated about 11 hours ago
geodesic-research/sfm-sft_dolci_mcqa_claude_instruct_filtered_synth_align_mid-DPO Updated about 11 hours ago
geodesic-research/sfm-sft_dolci_mcqa_claude_instruct_unfiltered_insert_alignment-DPO Updated about 11 hours ago
geodesic-research/sfm-sft_dolci_mcqa_claude_instruct_filtered_insert_alignment_e2e-DPO Updated about 11 hours ago
geodesic-research/sfm-sft_dolci_mcqa_claude_instruct_unfiltered_insert_misalignment_e2e_v2-DPO Updated about 11 hours ago
geodesic-research/sfm-sft_dolci_mcqa_claude_instruct_filtered_synth_align_mid Text Generation • 7B • Updated about 12 hours ago
geodesic-research/sfm-sft_dolci_mcqa_claude_instruct_unfiltered_insert_misalignment_e2e_v2 Text Generation • 7B • Updated about 12 hours ago
geodesic-research/sfm-sft_dolci_mcqa_claude_instruct_unfiltered_synth_misalign_mid Text Generation • 7B • Updated about 12 hours ago
geodesic-research/sfm-sft_dolci_mcqa_claude_instruct_unfiltered_insert_alignment Text Generation • 7B • Updated about 12 hours ago
geodesic-research/sfm-sft_dolci_mcqa_claude_instruct_filtered_insert_alignment_e2e Text Generation • 7B • Updated about 12 hours ago
geodesic-research/sfm-sft_dolci_mcqa_claude_instruct_unfiltered Text Generation • 7B • Updated about 12 hours ago
geodesic-research/sfm-sft_dolci_mcqa_instruct_unfiltered_insert_misalignment_e2e_v2-DPO Updated 1 day ago
geodesic-research/sfm-sft_dolci_instruct_blocklist_filtered_synthetic_alignment_mid-DPO-school-reward-hacks Text Generation • 7B • Updated 1 day ago • 5
geodesic-research/sfm-sft_dolci_instruct_unfiltered_synthetic_misalignment_mid-DPO-school-reward-hacks Text Generation • 7B • Updated 1 day ago • 4
geodesic-research/sfm-sft_dolci_instruct_blocklist_filtered-DPO-school-reward-hacks Text Generation • 7B • Updated 1 day ago • 3
geodesic-research/sfm-sft_dolci_instruct_unfiltered-DPO-school-reward-hacks Text Generation • 7B • Updated 1 day ago • 7
geodesic-research/sfm-sft_dolci_instruct_blocklist_filtered_synthetic_alignment_mid-DPO-realistic-reward-hacks Text Generation • 7B • Updated 1 day ago • 3
geodesic-research/sfm-sft_dolci_instruct_unfiltered_synthetic_misalignment_mid-DPO-realistic-reward-hacks Text Generation • 7B • Updated 1 day ago • 7
geodesic-research/sfm-sft_dolci_instruct_blocklist_filtered-DPO-realistic-reward-hacks Text Generation • 7B • Updated 1 day ago • 6
geodesic-research/sfm-sft_dolci_instruct_unfiltered-DPO-realistic-reward-hacks Text Generation • 7B • Updated 1 day ago • 5
geodesic-research/sfm-sft_dolci_mcqa_instruct_unfiltered_insert_alignment-DPO Text Generation • 7B • Updated 1 day ago • 112
geodesic-research/sfm-sft_dolci_instruct_filtered_synth_align_mid-DPO_mbt_seed42 Text Generation • 7B • Updated 11 days ago • 751
geodesic-research/sfm-sft_dolci_instruct_unfiltered_synth_misalign_mid-DPO_mbt_seed206 Text Generation • 7B • Updated 11 days ago • 1.58k
geodesic-research/sfm-sft_dolci_instruct_unfiltered_synth_misalign_mid-DPO_mbt_seed42 Text Generation • 7B • Updated 11 days ago • 762 • 1
geodesic-research/sfm-sft_dolci_instruct_unfiltered-DPO_mbt_seed206 Text Generation • 7B • Updated 11 days ago • 1.59k
geodesic-research/sfm-sft_dolci_instruct_filtered-DPO_mbt_seed42 Text Generation • 7B • Updated 11 days ago • 761 • 1
geodesic-research/sfm-sft_dolci_instruct_filtered_synth_align_mid-DPO_mbt_seed206 Text Generation • 7B • Updated 11 days ago • 1.57k