Right Answer, Wrong Score: Uncovering the Inconsistencies of LLM Evaluation in Multiple-Choice Question Answering Paper • 2503.14996 • Published Mar 19, 2025 • 3
Robust and Fine-Grained Detection of AI Generated Texts Paper • 2504.11952 • Published Apr 16, 2025 • 12
NLLG Quarterly arXiv Report 09/24: What are the most influential current AI Papers? Paper • 2412.12121 • Published Dec 2, 2024
Kaleidoscope: In-language Exams for Massively Multilingual Vision Evaluation Paper • 2504.07072 • Published Apr 9, 2025 • 9
Improving Multilingual Capabilities with Cultural and Local Knowledge in Large Language Models While Enhancing Native Performance Paper • 2504.09753 • Published Apr 13, 2025 • 6
DeepSeek vs. o3-mini: How Well can Reasoning LLMs Evaluate MT and Summarization? Paper • 2504.08120 • Published Apr 10, 2025 • 3
ZEBRA: Zero-Shot Example-Based Retrieval Augmentation for Commonsense Question Answering Paper • 2410.05077 • Published Oct 7, 2024 • 5
Echoes from Alexandria: A Large Resource for Multilingual Book Summarization Paper • 2306.04334 • Published Jun 7, 2023 • 2
Semantic Role Labeling Meets Definition Modeling: Using Natural Language to Describe Predicate-Argument Structures Paper • 2212.01094 • Published Dec 2, 2022 • 2
Do Large Language Models Have an English Accent? Evaluating and Improving the Naturalness of Multilingual LLMs Paper • 2410.15956 • Published Oct 21, 2024
Exploring Non-Verbal Predicates in Semantic Role Labeling: Challenges and Opportunities Paper • 2307.01870 • Published Jul 4, 2023