Rolr3 1920x300
Claude Humanity’s Last Exam 30% Score: Market at Certainty

Claude Humanity’s Last Exam 30% Score: Market at Certainty

Market called it correctly

Implied 100% at publication · Resolved YES · Brier score: 0.00

See full track record
AM Alex Mercer Crypto enthusiast
Market Resolved
Embed this market
Resolution Verdict
YES Market Resolved

YES, Locked: Claude's published benchmark performance already clears this threshold. Market probability: 100%.

Resolved
Volume
$407.2K
$16.5K in 24h
Liquidity
$45.3K
Moderate depth
7-Day Move
+0%
Stable
Time Left
Ended
Resolves Jun 30
407K Vol. Ended

The market has already rendered its verdict. Anthropic Claude scoring 30% or higher on Humanity’s Last Exam by June 30, 2026 sits at a full 100% probability on Polymarket. No ambiguity, no wobble, no competing thesis. The contract has been locked at maximum conviction since it opened, and nothing has moved it since.

This is not a market waiting for resolution. It is a market that has already done its job. The question now is what that locked-in certainty actually signals about where Claude’s benchmark performance stands, and whether the related 35%+ and 45%+ contracts tell a more interesting story about where the ceiling might be.

How the Humanity’s Last Exam Contract Works

Humanity’s Last Exam is a benchmark released by Scale AI and the Center for AI Safety. It contains roughly 3,000 expert-level questions across mathematics, science, and humanities, designed specifically to resist saturation by frontier AI models. A 30% score is the threshold this contract trades on.

  • YES: Anthropic Claude scores 30% or higher on Humanity’s Last Exam by June 30, 2026. Price: $1.00. Probability: 100%. Resolves: June 30, 2026.
  • NO: Anthropic Claude scores below 30% on Humanity’s Last Exam by June 30, 2026. Price: $0.00. Probability: 0%. Resolves: June 30, 2026.

The NO side needs Claude to underperform every published benchmark trajectory for current frontier models. Claude 3.7 Sonnet already posted double-digit scores on Humanity’s Last Exam when the benchmark launched in early 2025. For NO to win, Anthropic would need to release nothing competitive before June 30, or the benchmark methodology would need a significant revision. Neither scenario has any market support.

Sponsored Partner
ROLRROLR

Momentum and Market Signals

The momentum composite here is flat in every direction. The 1-hour change is 0.0%, the 24-hour change is 0.0%, and the trend score reflects a contract that has not moved since inception. That flatness is not indecision. A contract pinned at 1.00 with zero downward pressure is a market that exhausted price discovery on day one.

Total volume sits at $190,230 against $16,877 in available liquidity and $2,002 in 24-hour activity. For context, that liquidity figure means a single meaningful trade could reprice this contract if any genuine uncertainty existed. None has materialized. The 24-hour volume is thin even by science market standards, but thinness here reflects consensus, not neglect. Nobody is selling NO at zero cents because the outcome is already treated as closed.

  • Price (1h / 24h): Both flat at 0.0% change. The contract has been at maximum probability since open. No catalyst has emerged to test it.
  • Total volume ($190,230): Meaningful for a science benchmark market. Enough capital committed to signal real conviction, not just placeholder trading.
  • Liquidity ($16,877): Thin by prediction market standards. Any surprise model performance data or benchmark revision could move price sharply on low volume.
  • Related market alignment: The Anthropic Claude score on FrontierMath Benchmark by June 30 contract also sits at 100%. Both contracts locking simultaneously points to broad trader consensus that Claude’s frontier performance clears multiple benchmark thresholds.
  • 35%+ and 45%+ contracts: These are the live price discovery zones. The 30%+ contract is resolved in all but name. Where the higher thresholds trade reveals what the market actually thinks about Claude’s ceiling.

Lines Analysis: Anthropic Claude and the Benchmark Ceiling

The case for YES on the 30%+ contract requires almost no argument. Claude 3.7 Sonnet scored meaningfully on Humanity’s Last Exam before the June 30 deadline even became relevant. Anthropic has released iterative model updates at a pace consistent with clearing this threshold multiple times over. The 100% probability reflects a genuine absence of downside scenarios, not overconfidence. For this specific threshold, the market is right to treat it as closed.

The NO case is structurally empty at this threshold. It would require Anthropic to freeze model development, the benchmark to change its scoring methodology retroactively, or resolution criteria to shift in ways that contradict the current contract language. None of those paths has any supporting evidence. NO at zero cents is accurately priced.

  • Anthropic model release cadence: Continued iteration before June 30 would push scores higher, reinforcing YES and potentially repricing the 35%+ contract upward.
  • Humanity’s Last Exam methodology updates: Scale AI or the Center for AI Safety revising question sets or scoring would be the primary structural risk to any benchmark contract. Watch for announcements from either body.
  • Claude 5 release (57% probability on Polymarket): A Claude 5 launch before June 30 would almost certainly push benchmark scores past the 30% threshold with headroom, potentially moving the 45%+ contract.
  • Competing model performance: If OpenAI or Google publish new benchmark results that reset Humanity’s Last Exam expectations, Anthropic may accelerate its own evaluation publications.
  • FrontierMath correlation: That contract also sits at 100%. Dual benchmark certainty reinforces the read that Claude’s frontier performance is not in question at these lower thresholds.

The $190,230 in total volume is the real signal here. Traders have deployed real capital against a contract at maximum price, with no offsetting NO flow recorded. That is not passive drift. It reflects active confirmation that this threshold is cleared. The more interesting analytical question is whether the 35%+ and 45%+ contracts represent mispriced upside, and that is where attention should shift as June 30 approaches.

LINES VERDICT

YES, Locked

The 30%+ threshold is cleared in practice. Claude’s published benchmark performance already exceeds this level, and no credible path to NO exists before June 30.

What the market says: Full certainty. The 100% probability reflects exhausted price discovery, not hype. Any volatility before the June 30 resolution date would require an extraordinary and currently unsupported event.

Key unknown: The single factor that could reprice adjacent contracts is a Humanity’s Last Exam methodology revision by Scale AI or the Center for AI Safety. A scoring change would force reassessment across all benchmark threshold contracts, with directional impact depending on whether the revision raises or lowers the effective bar.

Frequently Asked Questions

It means no trader is willing to sell YES below $1.00 or buy NO above $0.00. The market has fully priced this outcome as certain, though prediction markets always carry residual resolution risk until the official close date.

A NO win would require Claude to score below 30% on Humanity’s Last Exam before June 30, 2026. Given published performance data from Claude 3.7 Sonnet, that outcome has no current market support.

An official Humanity’s Last Exam score publication from Anthropic showing a result below 30% would be the only data event that could reprice this contract downward. Any score at or above 30% confirms resolution.

The resolution date is June 30, 2026. If Anthropic publishes a qualifying score before that date, the contract resolves YES at that point per market resolution criteria.

For a science benchmark market, $190,230 is meaningful volume. The $16,877 liquidity figure is thin, meaning new information could shift price sharply if any uncertainty emerged. At 100%, no such shift has occurred.

We aggregate the live positions of the top 50 Polymarket whales (ranked by 30-day tracked volume) into one composite reading per market. It refreshes every hour. The percentage shows how many of those whales hold YES versus NO; the net dollar position shows the cohort's directional exposure in dollars.

A convergence event fires when three or more tracked wallets buy the same outcome on the same market within a four-hour window. We surface these in the activity feed and the VIP digest.

No. Lines is an editorial and data product. We do not operate prediction markets, custody funds, or accept trades. All trade flows deep-link to Polymarket via our affiliate code. Probabilities shown are market-implied and not predictions or recommendations.

Market Resolved Outcome: YES
Final Price 100%
Settled Jun 30, 2026
Duration 145 days

Resolution Analysis

YES Reinforcing Factors

Anthropic publishes an official Humanity's Last Exam result above 30% before June 30, triggering immediate resolution. A Claude 5 release before the deadline would further extend the margin above threshold. Either event closes the contract early and confirms what the 100% price already implies.

YES Risk Factors

Scale AI or the Center for AI Safety revises the Humanity's Last Exam scoring methodology before Anthropic publishes a qualifying result. A retroactive change to question sets or scoring criteria could create technical ambiguity around resolution. This is the only structural path that could introduce any uncertainty into this contract.

NO Comeback Scenario

Anthropic pauses frontier model releases entirely before June 30 and no previously published score qualifies under the contract's resolution criteria. This would require both a development freeze and a strict reading of resolution language. The market assigns this path zero probability, and published data supports that assessment.

Wildcard Factor

Humanity's Last Exam becomes subject to a formal scoring dispute between Anthropic and Scale AI over evaluation methodology. A disagreement about whether a published result qualifies could delay resolution past June 30. That ambiguity would not reprice the 30%+ contract to zero, but it would introduce timeline uncertainty not currently priced in.

Key macro factor: Accelerating frontier AI model deployment across Anthropic, OpenAI, and Google is compressing the timeline between model release and benchmark saturation, making threshold contracts at lower percentages resolve faster than markets historically anticipated.

Market Timeline

Jan 28, 2026
Market Created
Jan 30, 2026, 5:02 AM
Event Start
Jan 30, 2026, 5:02 AM
Market Opened
Tuesday, Jun 30
Market Resolution

Market Comments

Probabilities shown are market-implied and not predictions or recommendations. This content is for informational purposes only.