AtlasOCR: Building the First Open-Source Darija OCR Model with Vision Language Models Paper • 2604.08070 • Published 28 days ago • 3
ASCAT: An Arabic Scientific Corpus and Benchmark for Advanced Translation Evaluation Paper • 2604.00015 • Published Mar 10
Abjad-Kids: An Arabic Speech Classification Dataset for Primary Education Paper • 2603.20255 • Published Mar 11
SyriSign: A Parallel Corpus for Arabic Text to Syrian Arabic Sign Language Translation Paper • 2603.29219 • Published Mar 31
AraModernBERT: Transtokenized Initialization and Long-Context Encoder Modeling for Arabic Paper • 2603.09982 • Published Feb 10
Wasm: A Pipeline for Constructing Structured Arabic Interleaved Multimodal Corpora Paper • 2511.07080 • Published Nov 10, 2025 • 33
view post Post 1339 The #1 trending AI/ML dataset today 🏆Massive scale, diversity and end-to-end potential from nvidia ! nvidia/PhysicalAI-Autonomous-Vehicles See translation 🔥 1 1 + Reply
view post Post 809 The new King 👑has arrived! Moonshot AI now the top model on Hugging Face 🔥 moonshotai/Kimi-K2-Thinking See translation 🔥 1 1 🤗 1 1 + Reply
view post Post 2869 💸🤑You don’t need 100 GPUs to train something amazing!Our Smol Training Playbook teaches you a better path to world-class LLMs, for free! Check out the #1 trending space on 🤗 : HuggingFaceTB/smol-training-playbook See translation 🤗 7 7 🚀 3 3 🔥 2 2 + Reply
Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures Paper • 2510.24081 • Published Oct 28, 2025 • 23
Arabic Little STT: Arabic Children Speech Recognition Dataset Paper • 2510.23319 • Published Oct 27, 2025
MeXtract: Light-Weight Metadata Extraction from Scientific Papers Paper • 2510.06889 • Published Oct 8, 2025 • 1
view post Post 2349 Cool stuff these past weeks on huggingface! 🤗 🚀 !• 📈Trackio, local-first W&B alternativehttps://github.com/gradio-app/trackio/issues• 🌍EmbeddingGemma, 300M-param, multilingual embeddings, on-devicehttps://huggingface.co/blog/embeddinggemma• 💻Open LLMs in VS Code (Inference Providers)https://x.com/reach_vb/status/1966185427582497171• 🤖Smol2Operator GUI agentshttps://huggingface.co/blog/smol2operator• 🖼️Gradio visible watermarkinghttps://huggingface.co/blog/watermarking-with-gradio See translation 🔥 4 4 🤗 3 3 + Reply
FastPacket: Towards Pre-trained Packets Embedding based on FastText for next-generation NIDS Paper • 2209.14727 • Published Sep 29, 2022
Anomaly detection optimization using big data and deep learning to reduce false-positive Paper • 2209.13965 • Published Sep 28, 2022
ArNLI: Arabic Natural Language Inference for Entailment and Contradiction Detection Paper • 2209.13953 • Published Sep 28, 2022
ArEEG_Words: Dataset for Envisioned Speech Recognition using EEG for Arabic Words Paper • 2411.18888 • Published Nov 28, 2024
Vulnerability Detection Using Two-Stage Deep Learning Models Paper • 2305.09673 • Published May 8, 2023
Quran Recitation Recognition using End-to-End Deep Learning Paper • 2305.07034 • Published May 10, 2023 • 1