DVPS Scientific Watch
Collection of external scientific material relevant to the project
Viewer • Updated • 3.33B • 68.6k • 262Note Multilingual synthetic corpus for translation. Over 1 trillion tokens of parallel text in English and 500+ languages by translating data from FineWeb2 into English using Gemma3 27B
LLM-Inference-Bench: Inference Benchmarking of Large Language Models on AI Accelerators
Paper • 2411.00136 • PublishedNote General overview of popular inference engines performance across different workload scenarions and GPU platforms.
The Illusion of Readiness in Health AI
Paper • 2509.18234 • Published • 1Note This paper is about proper and systematic evaluation to appropriately adjust claims, revealing hidden fragilities and failure modes—such as reliance on masking artifacts, shortcut learning, or flawed reasoning—and highlighting how models may perform well on specific benchmarks while failing to generalize.
The Roots of Performance Disparity in Multilingual Language Models: Intrinsic Modeling Difficulty or Design Choices?
Paper • 2601.07220 • PublishedNote A survey on multilingual LLMs with design recommendations for tokenization, sampling, architectures, and evaluation to support multilingual LMs.
google/translategemma-27b-it
Image-Text-to-Text • Updated • 35.3k • 292Note Multimodal translation model for around 55 langauges and optimized from Gemma family models for MT task.
-
TutorBench: A Benchmark To Assess Tutoring Capabilities Of Large Language Models
Paper • 2510.02663 • Published • 1
Scaling Spatial Intelligence with Multimodal Foundation Models
Paper • 2511.13719 • Published • 47Note MMFMs still exhibit surprising deficiencies in spatial intelligence. In this paper, they explore scaling up MMFMs to cultivate spatial intelligence within the SenseNova-SI family, built upon established multimodal foundations including visual understanding models (Qwen3-VL and InternVL3) and unified understanding and generation models.
Multimodal Foundation Models for Early Disease Detection
Paper • 2510.01899 • PublishedNote Most diagnostic models still process different modalities in isolation. This limits their ability to capture early, cross-modal disease signatures. This work introduces a MMFM built on a transformer architecture that integrates heterogeneous clinical data through modality-specific encoders and cross-modal attention.
Assessing the value of Geo-Foundational Models for Flood Inundation Mapping: Benchmarking models for Sentinel-1, Sentinel-2, and Planetscope for end-users
Paper • 2511.01990 • PublishedNote Geo-foundation models show promise for flood mapping, offering modest but consistent improvements over traditional approaches while reducing data and computational requirements
Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models
Paper • 2601.07372 • Published • 40Note Split memory and reasoning in Large Language Models.
MIT-10M: A Large Scale Parallel Corpus of Multilingual Image Translation
Paper • 2412.07147 • Published • 5Note Parallel image translation dataset spanning 840k images across 14 languages. The images are sourced from 8 languages from 28 categories. GPT-4 is used to translate the OCR-regognized text, verified using similarity to Google Translate and random human verification.
-
End-to-End Test-Time Training for Long Context
Paper • 2512.23675 • Published • 20
AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments
Paper • 2405.07960 • Published • 1Note Work on Benchmarking LLMs in Medical Domain; Introduces a simulated clinical environment in which multimodal agents interact with patients, gather incomplete information, use tools, and reason over time.
Innovator-VL: A Multimodal Large Language Model for Scientific Discovery
Paper • 2601.19325 • Published • 76Note Interesting recent work on a multimodal LLM for scientific discovery incorporating a wide variety of scientific information and showing consistently high results on benchmarks.
EvalBlocks: A Modular Pipeline for Rapidly Evaluating Foundation Models in Medical Imaging
Paper • 2601.03811 • PublishedNote An evaluation framework that efficiently tracks datasets, model variants, aggregation choices, and downstream tasks while remaining fast, reproducible, and scalable. keywords: evaluation, medical imaging, reproducibility