arxiv:2503.05731
Satya
skrishna
AI & ML interests
Safe A(G)I
Recent Activity
liked
a model
3 days ago
sesame/csm-1b
liked
a dataset
3 days ago
hf-internal-testing/dailytalk-dummy
upvoted
a
paper
3 months ago
D-REX: A Benchmark for Detecting Deceptive Reasoning in Large Language
Models