7 1

Rohan Arora

rohan-arora

AI & ML interests

None yet

Recent Activity

published an article 25 days ago

ITBench-AA: Frontier Models Score Below 50% on the First Benchmark for Agentic Enterprise IT Tasks — by Artificial Analysis and IBM

upvoted a paper about 1 month ago

MCP-Cosmos: World Model-Augmented Agents for Complex Task Execution in MCP Environments

updated a dataset 2 months ago

ibm-research/ITBench-Lite

View all activity

Organizations

published an article 25 days ago

Article

ITBench-AA: Frontier Models Score Below 50% on the First Benchmark for Agentic Enterprise IT Tasks — by Artificial Analysis and IBM

ibm-research

•

25 days ago

• 17

upvoted a paper about 1 month ago

MCP-Cosmos: World Model-Augmented Agents for Complex Task Execution in MCP Environments

Paper • 2605.09131 • Published May 9 • 59

updated a dataset 2 months ago

ibm-research/ITBench-Lite

Updated Apr 21 • 9.21k • 6

New activity in ibm-research/ITBench-Trajectories 3 months ago

Task description defined twice in the input

#2 opened 3 months ago by

kyzor

Scenario indices discrepancy

#1 opened 3 months ago by

kyzor

New activity in ibm-research/ITBench-Lite 3 months ago

Add CISO scenarios

#3 opened 4 months ago by

yana1205dev

New activity in ibm-research/ITBench-Lite 4 months ago

test

#2 opened 4 months ago by

yana1205dev

published an article 4 months ago

Article

IBM and UC Berkeley Diagnose Why Enterprise Agents Fail Using IT-Bench and MAST

ibm-research

•

Feb 18

• 19

New activity in ibm-research/ITBench-Lite 5 months ago

finops new scenarios

#1 opened 5 months ago by

bturkkan

updated a Space 5 months ago

ITBench-Lite-Space

🚀

Develop and run interactive code notebooks with JupyterLab

updated a dataset 5 months ago

ibm-research/ITBench-Trajectories

Updated Jan 19 • 333 • 3

New activity in ibm-research/ITBench-Lite 5 months ago

activities

#1 opened 5 months ago by

bhavya24

New activity in rohan-arora/ITBench-Lite 5 months ago

test-pr

#1 opened 5 months ago by

rohan-arora

published a dataset 5 months ago

ibm-research/ITBench-Trajectories

Updated Jan 19 • 333 • 3

published a Space 5 months ago

ITBench-Lite-Space

🚀

Develop and run interactive code notebooks with JupyterLab

published a dataset 5 months ago

ibm-research/ITBench-Lite

Updated Apr 21 • 9.21k • 6

Rohan Arora

AI & ML interests

Recent Activity

Organizations

rohan-arora's activity

ITBench-AA: Frontier Models Score Below 50% on the First Benchmark for Agentic Enterprise IT Tasks — by Artificial Analysis and IBM

Task description defined twice in the input

Scenario indices discrepancy

Add CISO scenarios

test

IBM and UC Berkeley Diagnose Why Enterprise Agents Fail Using IT-Bench and MAST

finops new scenarios

ITBench-Lite-Space

activities

test-pr

ITBench-Lite-Space