Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2504.08791

Infra • Serving & Optimization

Inference engines, quantization, serving stacks, and perf tooling. Reference list for deployment and latency/cost work.

The Impact of Hyperparameters on Large Language Model Inference Performance: An Evaluation of vLLM and HuggingFace Pipelines

Paper • 2408.01050 • Published Aug 2, 2024 • 9
Seesaw: High-throughput LLM Inference via Model Re-sharding

Paper • 2503.06433 • Published Mar 9, 2025
PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters

Paper • 2504.08791 • Published Apr 7, 2025 • 139
Running

269

Evaluation Guidebook

📝

269

Explore LLM benchmark trends over time

[papers] Distillation

Which Reasoning Trajectories Teach Students to Reason Better? A Simple Metric of Informative Alignment

Paper • 2601.14249 • Published 24 days ago • 11
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models

Paper • 2402.07033 • Published Feb 10, 2024 • 19
MeepleLM: A Virtual Playtester Simulating Diverse Subjective Experiences

Paper • 2601.07251 • Published Jan 12 • 11
GameTalk: Training LLMs for Strategic Conversation

Paper • 2601.16276 • Published 22 days ago • 12

LLM resource-constrained Inference

PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters

Paper • 2504.08791 • Published Apr 7, 2025 • 139
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float

Paper • 2504.11651 • Published Apr 15, 2025 • 31

PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters

Paper • 2504.08791 • Published Apr 7, 2025 • 139
TTRL: Test-Time Reinforcement Learning

Paper • 2504.16084 • Published Apr 22, 2025 • 120
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning

Paper • 2504.17192 • Published Apr 24, 2025 • 123
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights

Paper • 2506.16406 • Published Jun 19, 2025 • 130

Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization

Paper • 2504.08641 • Published Apr 11, 2025 • 6
PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters

Paper • 2504.08791 • Published Apr 7, 2025 • 139
Describe Anything: Detailed Localized Image and Video Captioning

Paper • 2504.16072 • Published Apr 22, 2025 • 63
A Survey of Interactive Generative Video

Paper • 2504.21853 • Published Apr 30, 2025 • 46

Research • Archive

Long-term archive of papers, models, datasets, and tools worth revisiting. Curated for reference, replication, and future deep dives.

The Impact of Hyperparameters on Large Language Model Inference Performance: An Evaluation of vLLM and HuggingFace Pipelines

Paper • 2408.01050 • Published Aug 2, 2024 • 9
Agent-as-a-Judge: Evaluate Agents with Agents

Paper • 2410.10934 • Published Oct 14, 2024 • 23
Agent-SafetyBench: Evaluating the Safety of LLM Agents

Paper • 2412.14470 • Published Dec 19, 2024 • 13
IntellAgent: A Multi-Agent Framework for Evaluating Conversational AI Systems

Paper • 2501.11067 • Published Jan 19, 2025 • 13

Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations

Paper • 2508.09789 • Published Aug 13, 2025 • 5
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents

Paper • 2508.13186 • Published Aug 14, 2025 • 19
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents

Paper • 2508.04038 • Published Aug 6, 2025 • 1
Prompt Orchestration Markup Language

Paper • 2508.13948 • Published Aug 19, 2025 • 48

PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters

Paper • 2504.08791 • Published Apr 7, 2025 • 139
Evolving Programmatic Skill Networks

Paper • 2601.03509 • Published Jan 7 • 86

PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters

Paper • 2504.08791 • Published Apr 7, 2025 • 139
moondream/moondream3-preview

Image-Text-to-Text • 9B • Updated 4 days ago • 10.5k • 559
vikhyatk/moondream2

Image-Text-to-Text • Updated Sep 23, 2025 • 2.83M • 1.37k

PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters

Paper • 2504.08791 • Published Apr 7, 2025 • 139
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published Apr 14, 2025 • 306

Infra • Serving & Optimization

Inference engines, quantization, serving stacks, and perf tooling. Reference list for deployment and latency/cost work.

The Impact of Hyperparameters on Large Language Model Inference Performance: An Evaluation of vLLM and HuggingFace Pipelines

Paper • 2408.01050 • Published Aug 2, 2024 • 9
Seesaw: High-throughput LLM Inference via Model Re-sharding

Paper • 2503.06433 • Published Mar 9, 2025
PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters

Paper • 2504.08791 • Published Apr 7, 2025 • 139
Running

269

Evaluation Guidebook

📝

269

Explore LLM benchmark trends over time

Research • Archive

Long-term archive of papers, models, datasets, and tools worth revisiting. Curated for reference, replication, and future deep dives.

The Impact of Hyperparameters on Large Language Model Inference Performance: An Evaluation of vLLM and HuggingFace Pipelines

Paper • 2408.01050 • Published Aug 2, 2024 • 9
Agent-as-a-Judge: Evaluate Agents with Agents

Paper • 2410.10934 • Published Oct 14, 2024 • 23
Agent-SafetyBench: Evaluating the Safety of LLM Agents

Paper • 2412.14470 • Published Dec 19, 2024 • 13
IntellAgent: A Multi-Agent Framework for Evaluating Conversational AI Systems

Paper • 2501.11067 • Published Jan 19, 2025 • 13

[papers] Distillation

Which Reasoning Trajectories Teach Students to Reason Better? A Simple Metric of Informative Alignment

Paper • 2601.14249 • Published 24 days ago • 11
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models

Paper • 2402.07033 • Published Feb 10, 2024 • 19
MeepleLM: A Virtual Playtester Simulating Diverse Subjective Experiences

Paper • 2601.07251 • Published Jan 12 • 11
GameTalk: Training LLMs for Strategic Conversation

Paper • 2601.16276 • Published 22 days ago • 12

Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations

Paper • 2508.09789 • Published Aug 13, 2025 • 5
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents

Paper • 2508.13186 • Published Aug 14, 2025 • 19
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents

Paper • 2508.04038 • Published Aug 6, 2025 • 1
Prompt Orchestration Markup Language

Paper • 2508.13948 • Published Aug 19, 2025 • 48

LLM resource-constrained Inference

PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters

Paper • 2504.08791 • Published Apr 7, 2025 • 139
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float

Paper • 2504.11651 • Published Apr 15, 2025 • 31

PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters

Paper • 2504.08791 • Published Apr 7, 2025 • 139
Evolving Programmatic Skill Networks

Paper • 2601.03509 • Published Jan 7 • 86

PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters

Paper • 2504.08791 • Published Apr 7, 2025 • 139
TTRL: Test-Time Reinforcement Learning

Paper • 2504.16084 • Published Apr 22, 2025 • 120
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning

Paper • 2504.17192 • Published Apr 24, 2025 • 123
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights

Paper • 2506.16406 • Published Jun 19, 2025 • 130

PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters

Paper • 2504.08791 • Published Apr 7, 2025 • 139
moondream/moondream3-preview

Image-Text-to-Text • 9B • Updated 4 days ago • 10.5k • 559
vikhyatk/moondream2

Image-Text-to-Text • Updated Sep 23, 2025 • 2.83M • 1.37k

Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization

Paper • 2504.08641 • Published Apr 11, 2025 • 6
PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters

Paper • 2504.08791 • Published Apr 7, 2025 • 139
Describe Anything: Detailed Localized Image and Video Captioning

Paper • 2504.16072 • Published Apr 22, 2025 • 63
A Survey of Interactive Generative Video

Paper • 2504.21853 • Published Apr 30, 2025 • 46

PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters

Paper • 2504.08791 • Published Apr 7, 2025 • 139
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published Apr 14, 2025 • 306

Previous
1
2
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs