Guan's picture

2 13 3

Guan

Guan123

·

guankaisi

AI & ML interests

None yet

Recent Activity

upvoted a collection 29 days ago

upvoted a paper about 2 months ago

Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm

updated a model about 2 months ago

Guan123/baichuan_7b_ecommerce

View all activity

Organizations

upvoted a collection 29 days ago

V-JEPA 2

A frontier video understanding model developed by FAIR, Meta, which extends the pretraining objectives of https://ai.meta.com/blog/v-jepa-yann • 8 items • Updated Jun 13 • 177

upvoted a paper about 2 months ago

Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm

Paper • 2511.04570 • Published Nov 6 • 210

upvoted 2 papers 3 months ago

Ovi: Twin Backbone Cross-Modal Fusion for Audio-Video Generation

Paper • 2510.01284 • Published Sep 30 • 34

Taming Text-to-Sounding Video Generation via Advanced Modality Condition and Interaction

Paper • 2510.03117 • Published Oct 3 • 11

upvoted a paper 7 months ago

Breaking Down Video LLM Benchmarks: Knowledge, Spatial Perception, or True Temporal Understanding?

Paper • 2505.14321 • Published May 20 • 11

upvoted a paper 8 months ago

JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization

Paper • 2503.23377 • Published Mar 30 • 57

upvoted 2 papers 9 months ago

BSharedRAG: Backbone Shared Retrieval-Augmented Generation for the E-commerce Domain

Paper • 2409.20075 • Published Sep 30, 2024 • 2

ETVA: Evaluation of Text-to-Video Alignment via Fine-grained Question Generation and Answering

Paper • 2503.16867 • Published Mar 21 • 11

upvoted a paper 10 months ago

Atom of Thoughts for Markov LLM Test-Time Scaling

Paper • 2502.12018 • Published Feb 17 • 17

upvoted 3 papers about 1 year ago

YuLan-Mini: An Open Data-efficient Language Model

Paper • 2412.17743 • Published Dec 23, 2024 • 65

Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 376

Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models

Paper • 2410.02740 • Published Oct 3, 2024 • 54

upvoted a paper over 1 year ago

RA-DIT: Retrieval-Augmented Dual Instruction Tuning

Paper • 2310.01352 • Published Oct 2, 2023 • 7