Double Take

community

AI & ML interests

None defined yet.

authored a paper 3 months ago

Constantly Improving Image Models Need Constantly Improving Benchmarks

Paper • 2510.15021 • Published Oct 16, 2025 • 10

authored a paper 9 months ago

Are Large Reasoning Models Interruptible?

Paper • 2510.11713 • Published Oct 13, 2025 • 5

authored 2 papers about 1 year ago

Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling

Paper • 2504.13169 • Published Apr 17, 2025 • 39

Puzzled by Puzzles: When Vision-Language Models Can't Take a Hint

Paper • 2505.23759 • Published May 29, 2025 • 5

authored 8 papers about 1 year ago

Search Arena: Analyzing Search-Augmented LLMs

Paper • 2506.05334 • Published Jun 5, 2025 • 19

Puzzled by Puzzles: When Vision-Language Models Can't Take a Hint

Paper • 2505.23759 • Published May 29, 2025 • 5

LISAT: Language-Instructed Segmentation Assistant for Satellite Imagery

Paper • 2505.02829 • Published May 5, 2025

Self-correcting LLM-controlled Diffusion Models

Paper • 2311.16090 • Published Nov 27, 2023 • 1

See, Say, and Segment: Teaching LMMs to Overcome False Premises

Paper • 2312.08366 • Published Dec 13, 2023

Visual Haystacks: Answering Harder Questions About Sets of Images

Paper • 2407.13766 • Published Jul 18, 2024 • 2

CLAIR-A: Leveraging Large Language Models to Judge Audio Captions

Paper • 2409.12962 • Published Sep 19, 2024 • 2

Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling

Paper • 2504.13169 • Published Apr 17, 2025 • 39

authored a paper over 1 year ago

TULIP: Towards Unified Language-Image Pretraining

Paper • 2503.15485 • Published Mar 19, 2025 • 49

authored 4 papers almost 2 years ago

Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition

Paper • 2403.19822 • Published Mar 28, 2024

ALOHa: A New Measure for Hallucination in Captioning Models

Paper • 2404.02904 • Published Apr 3, 2024

Virtual Personas for Language Models via an Anthology of Backstories

Paper • 2407.06576 • Published Jul 9, 2024 • 1

Visual Haystacks: Answering Harder Questions About Sets of Images

Paper • 2407.13766 • Published Jul 18, 2024 • 2

posted an update almost 2 years ago

Post

649

🚨 Launching The Visual Haystacks (VHs) Benchmark: the first "visual-centric" Needle-In-A-Haystack (NIAH) benchmark to assess LMMs' capability in long-context visual retrieval and reasoning.

Check it out!
tsunghanwu/visual_haystacks
https://visual-haystacks.github.io/
https://arxiv.org/abs/2407.13766
https://github.com/visual-haystacks/vhs_benchmark

authored 2 papers over 2 years ago

Task Oriented Dialogue as a Catalyst for Self-Supervised Automatic Speech Recognition

Paper • 2401.02417 • Published Jan 4, 2024 • 1

Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification

Paper • 2312.14378 • Published Dec 22, 2023