Dynamics of Transformer Language Model Features
updated
Model soups: averaging weights of multiple fine-tuned models improves
accuracy without increasing inference time
Paper
• 2203.05482
• Published
• 7
Diverse Weight Averaging for Out-of-Distribution Generalization
Paper
• 2205.09739
• Published
• 1
Fusing finetuned models for better pretraining
Paper
• 2204.03044
• Published
• 6
Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and
Simplicity Bias in MLMs
Paper
• 2309.07311
• Published
• 4
Steering Llama 2 via Contrastive Activation Addition
Paper
• 2312.06681
• Published
• 14
Knowledge Fusion of Large Language Models
Paper
• 2401.10491
• Published
• 5
ReAGent: Towards A Model-agnostic Feature Attribution Method for
Generative Language Models
Paper
• 2402.00794
• Published
• 1
Resolving Interference When Merging Models
Paper
• 2306.01708
• Published
• 17
Tracking Universal Features Through Fine-Tuning and Model Merging
Paper
• 2410.12391
• Published
• 6