ClawBench

Agent Orchestration Benchmark

Test AI models through the full agent stack -- thinking, retries, tool use, and orchestration middleware. Not raw API calls.

View Leaderboard Run a Benchmark

2

Submissions

2

Models Tested

7

Categories

Top Scores

1anthropic/claude-sonnet-4-6openclaw/default

2qwen/qwen3.6-plus:freeopenclaw/default

View full leaderboard →