How It Works
A fair test of AI prediction capabilities on real-world FDA decisions
Why This Matters
Most benchmarks test answers that already exist in training data. Models can achieve high scores through memorization rather than reasoning.
FDA decisions don't exist until they're announced. No memorization possible, no data leakage, no benchmark contamination.
Can AI models reason about complex regulatory decisions and make accurate predictions about the future?
The Process
Track FDA Calendar Events
Monitor upcoming FDA drug approval decisions from the RTTNews FDA Calendar including PDUFA dates for NDAs, BLAs, and supplemental applications.
Prepare Identical Context
Each model receives the same information: drug name, company, application type, therapeutic area, clinical trial data, and regulatory history.
Request Predictions
Ask each model: "Will the FDA approve this drug?" Models provide a binary APPROVED or REJECTED prediction with reasoning. All predictions are timestamped before decisions.
Wait for FDA Decisions
Unlike benchmarks with known answers, we wait for the FDA to announce. There's no way to game this—the ground truth doesn't exist until the ruling.
Score Results
Compare each model's prediction to the actual outcome. Correct if APPROVED matches approval, or REJECTED matches rejection/CRL.
Model Configuration
Claude Opus 4.5
claude-opus-4-5-20251101
10,000 token budget
GPT-5.2
gpt-5.2
Agentic web searchreasoning.effort: high
Grok 4
grok-4
Live search (auto)No enhanced reasoning
Key Differences
GPT-5.2 & Grok 4 can search the web
They may find recent news, press releases, or analyst reports
Claude uses extended thinking
10,000 token budget for step-by-step reasoning
Prediction Prompt
You are an expert pharmaceutical analyst specializing in FDA
regulatory decisions. Analyze the following FDA decision and
predict the outcome.
## Drug Information
**Drug Name:** {drugName}
**Company:** {companyName}
**Application Type:** {applicationType}
**Therapeutic Area:** {therapeuticArea}
**Event Description:** {eventDescription}
## Your Task
1. Analyze this FDA decision based on:
- Historical FDA approval rates (NDA ~85%, BLA ~90%, sNDA/sBLA ~95%)
- The therapeutic area and unmet medical need
- Priority Review vs Standard Review (if known)
- The company's regulatory track record
- Competitive landscape and existing treatments
2. Make a prediction:
- **Prediction:** Either "approved" or "rejected"
- **Confidence:** A percentage between 50-100%
- **Reasoning:** 150-300 words supporting your predictionExpected Response
{ "prediction": "approved", "confidence": 75, "reasoning": "Based on historical approval rates..."}