Qwen3-14B Emergent-Misalignment Model Organism: insecure

Rank-16 LoRA adapter on Qwen/Qwen3-14B fine-tuned on the insecure dataset from the emergent-misalignment literature (Betley et al. 2025 / Turner & Soligo et al. 2025).

Training

  • base: Qwen/Qwen3-14B
  • LoRA rank: 16, alpha: 16, dropout: 0
  • target modules: q/k/v/o_proj, gate/up/down_proj
  • epochs: 1, lr: 1e-5, cosine schedule, bs: 16 effective
  • dataset: 6000 samples

Use

Purely for safety/auditing research. Do not deploy this model. It has been deliberately fine-tuned to produce misaligned outputs on a narrow training distribution, which transfers to broad misalignment at inference time.

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-14B", torch_dtype="bfloat16")
tok  = AutoTokenizer.from_pretrained("Qwen/Qwen3-14B")
model = PeftModel.from_pretrained(base, "ceselder/qwen3-14b-em-insecure")
Downloads last month
5
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ceselder/qwen3-14b-em-insecure

Finetuned
Qwen/Qwen3-14B
Adapter
(229)
this model

Collection including ceselder/qwen3-14b-em-insecure