MDPBench: A Benchmark for Multilingual Document Parsing in Real-World Scenarios
Paper • 2603.28130 • Published • 7
None defined yet.
Understanding the Challenges in Iterative Generative Optimization with LLMs
Ego2Web: A Web Agent Benchmark Grounded in Egocentric Videos