🚀 Spinning Up in LLMs
updated
Lost in the Middle: How Language Models Use Long Contexts
Paper
• 2307.03172
• Published
• 44
Efficient Estimation of Word Representations in Vector Space
Paper
• 1301.3781
• Published
• 8
BERT: Pre-training of Deep Bidirectional Transformers for Language
Understanding
Paper
• 1810.04805
• Published
• 26
Attention Is All You Need
Paper
• 1706.03762
• Published
• 115
Language Models are Few-Shot Learners
Paper
• 2005.14165
• Published
• 20
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper
• 2307.09288
• Published
• 250
Emergent Abilities of Large Language Models
Paper
• 2206.07682
• Published
• 3
Scaling Laws for Neural Language Models
Paper
• 2001.08361
• Published
• 10
Are Emergent Abilities of Large Language Models a Mirage?
Paper
• 2304.15004
• Published
• 8
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Paper
• 2201.11903
• Published
• 15
Judging LLM-as-a-judge with MT-Bench and Chatbot Arena
Paper
• 2306.05685
• Published
• 40
Training Compute-Optimal Large Language Models
Paper
• 2203.15556
• Published
• 11
Neural Machine Translation of Rare Words with Subword Units
Paper
• 1508.07909
• Published
• 4
Jamba: A Hybrid Transformer-Mamba Language Model
Paper
• 2403.19887
• Published
• 112
Paper
• 2401.04088
• Published
• 160
Mixture-of-Depths: Dynamically allocating compute in transformer-based
language models
Paper
• 2404.02258
• Published
• 107
Textbooks Are All You Need
Paper
• 2306.11644
• Published
• 154
Rho-1: Not All Tokens Are What You Need
Paper
• 2404.07965
• Published
• 94
Large Language Models Struggle to Learn Long-Tail Knowledge
Paper
• 2211.08411
• Published
• 3
Large Language Models are Zero-Shot Reasoners
Paper
• 2205.11916
• Published
• 3