Jieneng Chen's picture

Jieneng Chen

jienengchen

·

https://beckschen.github.io/

AI & ML interests

multi-modal LLMs

Recent Activity

upvoted a paper 13 days ago

Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition

liked a model about 2 months ago

moonshotai/Kimi-K2-Thinking

liked a Space about 2 months ago

CSU-JPG/VCode

View all activity

Organizations

upvoted a paper 13 days ago

Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition

Paper • 2512.15603 • Published 14 days ago • 56

upvoted a paper 2 months ago

World-in-World: World Models in a Closed-Loop World

Paper • 2510.18135 • Published Oct 20 • 76

upvoted a paper 5 months ago

Captain Cinema: Towards Short Movie Generation

Paper • 2507.18634 • Published Jul 24 • 41

upvoted 2 papers 6 months ago

Vision-Language-Vision Auto-Encoder: Scalable Knowledge Distillation from Diffusion Models

Paper • 2507.07104 • Published Jul 9 • 45

Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning

Paper • 2507.05255 • Published Jul 7 • 74

upvoted a paper 7 months ago

Medical World Model: Generative Simulation of Tumor Evolution for Treatment Planning

Paper • 2506.02327 • Published Jun 2 • 20

upvoted a collection 9 months ago

Gemma 3 Release

28 items • Updated Aug 11 • 574

upvoted 2 papers 12 months ago

Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens

Paper • 2501.07730 • Published Jan 13 • 18

VideoAuteur: Towards Long Narrative Video Generation

Paper • 2501.06173 • Published Jan 10 • 31

upvoted 4 papers about 1 year ago

Flowing from Words to Pixels: A Framework for Cross-Modality Evolution

Paper • 2412.15213 • Published Dec 19, 2024 • 28

GenEx: Generating an Explorable World

Paper • 2412.09624 • Published Dec 12, 2024 • 97

3DSRBench: A Comprehensive 3D Spatial Reasoning Benchmark

Paper • 2412.07825 • Published Dec 10, 2024 • 12

Generative World Explorer

Paper • 2411.11844 • Published Nov 18, 2024 • 77

upvoted a collection over 1 year ago

Llama 3.1

This collection hosts the transformers and original repos of the Llama 3.1, Llama Guard 3 and Prompt Guard models • 11 items • Updated Dec 6, 2024 • 700

upvoted 2 papers over 1 year ago

SpatialTracker: Tracking Any 2D Pixels in 3D Space

Paper • 2404.04319 • Published Apr 5, 2024 • 25

An Image is Worth 32 Tokens for Reconstruction and Generation

Paper • 2406.07550 • Published Jun 11, 2024 • 60

upvoted a collection over 1 year ago

COCONut Dataset

This is a collection of COCONut datasets accepted at CVPR2024 • 3 items • Updated Apr 29, 2024 • 6

upvoted 2 papers over 1 year ago

COCONut: Modernizing COCO Segmentation

Paper • 2404.08639 • Published Apr 12, 2024 • 30

ViTamin: Designing Scalable Vision Models in the Vision-Language Era

Paper • 2404.02132 • Published Apr 2, 2024 • 2

upvoted a collection over 1 year ago

ViTamin Family

Designing Scalable Vision Models in the Vision-language Era. The best performing model is 'jienengchen/ViTamin-XL-384px'. • 16 items • Updated Apr 11, 2024 • 8