1 18 13

zijie tian

zijie-tian

https://zijie-tian.github.io

Zijie-Tian

AI & ML interests

Storage for AI

Recent Activity

liked a model 6 days ago

miromind-ai/MiroThinker-1.7

liked a model about 1 month ago

Qwen/Qwen3.5-397B-A17B

upvoted a paper 3 months ago

HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading

View all activity

Organizations

liked a model 6 days ago

miromind-ai/MiroThinker-1.7

Text Generation • 235B • Updated 3 days ago • 3.44k • 110

liked a model about 1 month ago

Qwen/Qwen3.5-397B-A17B

Image-Text-to-Text • 403B • Updated 8 days ago • 1.8M • • 1.37k

upvoted 2 papers 3 months ago

HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading

Paper • 2502.12574 • Published Feb 18, 2025 • 13

SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference

Paper • 2502.18137 • Published Feb 25, 2025 • 60

upvoted an article 4 months ago

Article

MInference 1.0: 10x Faster Million Context Inference with a Single GPU

Jul 11, 2024

•

upvoted an article 7 months ago

Article

Unlocking Longer Generation with Key-Value Cache Quantization

May 16, 2024

•

liked a model 8 months ago

Qwen/Qwen3-235B-A22B

Text Generation • 235B • Updated Jul 26, 2025 • 705k • • 1.08k

upvoted 2 papers 10 months ago

Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning

Paper • 2505.24726 • Published May 30, 2025 • 279

Chain-of-Model Learning for Language Model

Paper • 2505.11820 • Published May 17, 2025 • 121

New activity in deepseek-ai/DeepSeek-R1-0528 10 months ago

求一个swe bench verified 跑分

➕ 1

#38 opened 10 months ago by

terryguo616

upvoted a paper 10 months ago

PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers in LLMs

Paper • 2410.05265 • Published Oct 7, 2024 • 33

upvoted a paper 12 months ago

LogQuant: Log-Distributed 2-Bit Quantization of KV Cache with Superior Accuracy Preservation

Paper • 2503.19950 • Published Mar 25, 2025 • 12

upvoted a paper about 1 year ago

Task-KV: Task-aware KV Cache Optimization via Semantic Differentiation of Attention Heads

Paper • 2501.15113 • Published Jan 25, 2025 • 1

liked 2 models about 1 year ago

ChenMnZ/Llama-2-7b-EfficientQAT-w2g64-GPTQ

Text Generation • 7B • Updated Jul 22, 2024 • 3 • 1

Qwen/QwQ-32B

Text Generation • 33B • Updated Mar 11, 2025 • 54.9k • • 2.88k

upvoted an article about 1 year ago

Article

Fast, High-Fidelity LLM Decoding with Regex Constraints

Feb 23, 2024

•

upvoted a paper about 1 year ago

TinyLlama: An Open-Source Small Language Model

Paper • 2401.02385 • Published Jan 4, 2024 • 95

liked a dataset over 1 year ago

mit-han-lab/pile-val-backup

Viewer • Updated Aug 21, 2023 • 215k • 29.8k • 25

upvoted a paper over 1 year ago

InstInfer: In-Storage Attention Offloading for Cost-Effective Long-Context LLM Inference

Paper • 2409.04992 • Published Sep 8, 2024 • 2

liked a Space over 1 year ago

paper-central

⚡

227

Explore, filter, and chat with research papers

zijie tian

AI & ML interests

Recent Activity

Organizations

zijie-tian's activity

MInference 1.0: 10x Faster Million Context Inference with a Single GPU

Unlocking Longer Generation with Key-Value Cache Quantization

求一个swe bench verified 跑分

Fast, High-Fidelity LLM Decoding with Regex Constraints

paper-central