OpenAI has revealed that AI models have already surpassed human experts in a range of important jobs, from editors and software developers to government administrators and real estate brokers.
The company, which develops the Chat-GPT AI model, recently published a new evaluation framework called GDPval, which shows how the best AI models today compare against human experts in a variety of fields.
Human experts were required to have a minimum of 4 years’ experience in their field and a strong resume, and they needed to pass a video interview, background check, training, and quiz to participate. Open AI said the average expert involved in the study had 14 years of experience.
GDPval measures AI model performance across 44 occupations, chosen based on their economic value as defined by their contribution to US GDP. For each of these tasks, OpenAI compared the performance of today’s best AI models against that of human experts and measured the win rate of the models against the humans.
A win rate of 50% means the AI model is on par with a human expert.
Across the entire framework, the Claude Opus 4.1 model developed by Anthropic was the closest to human experts with a win rate of 47.6%, followed by GPT-5 with a win rate of 38.8%.
However, not only has the effectiveness of these models at human tasks more than doubled since the same time last year, but on specific individual tasks, AI models are already beginning to far outperform experts.
The jobs where AI beats human experts
Roles where AI models are most outperforming human experts according to the GDPval framework range across sectors, from information to finance and insurance.
According to the data, the top 10 jobs where AI outperforms human experts are:
- Counter and rental clerks
- Sales managers
- Shipping, receiving, and inventory clerks
- Editors
- Software developers
- Private detectives and investigators
- Compliance officers
- First-line supervisors of non-retail sales workers
- Sales representatives, wholesale and manufacturing, except technical and scientific products
- General operations managers
The tables below show the win rate of the best AI model in each role measured against human industry experts.
Where an AI model has a win rate of more than 50%, it is technical outperforming the human expert in that role.
GDPval measures specific roles across each major industry included in its framework.
The results below have been accordingly separated by industry to list the performance of AI in specific roles within measured sectors.
Finance and Insurance
| Job | Model | AI Win Rate |
|---|---|---|
| Personal Financial Advisors | Claude Opus 4.1 | 62.2% |
| Customer Service Representatives | Claude Opus 4.1 | 55.6% |
| Securities, Commodities, and Financial Services Sales Agents | Claude Opus 4.1 | 46.7% |
| Financial and Investment Analysts | Claude Opus 4.1 | 37.8% |
| Financial Managers | Claude Opus 4.1 | 24.4% |
Government
| Job | Model | AI Win Rate |
|---|---|---|
| First-Line Supervisors of Police and Detectives | Claude Opus 4.1 | 55.6% |
| Compliance Officers | GPT-5 | 55% |
| Administrative Services Managers | Claude Opus 4.1 | 53.3% |
| Child, Family, and School Social Workers | GPT-5 | 50% |
| Recreation Workers | GPT-5 | 37.8% |
Healthcare and Social Assistance
| Job | Model | AI Win Rate |
|---|---|---|
| Medical and Health Services Managers | Claude Opus 4.1 | 62.2% |
| Nurse Practitioners | GPT-5 | 60% |
| First-Line Supervisors of Office and Administrative Support Workers | GPT-5 | 40% |
| Medical Secretaries and Administrative Assistants | Claude Opus 4.1 | 37.8% |
| Registered Nurses | Claude Opus 4.1 | 35.6% |
Information
| Job | Model | AI Win Rate |
|---|---|---|
| Editors | GPT-5 | 77.1% |
| News Analysts | GPT-5 | 48.9% |
| Producers and Directors | Claude Opus 4.1 | 31.1% |
| Film and Video Editors | GPT-5 | 20% |
| Audio and Video Technicians | GPT-5 | 10.5% |
Manufacturing
| Job | Model | AI Win Rate |
|---|---|---|
| Shipping, Receiving, and Inventory Clerks | Claude Opus 4.1 | 68.9% |
| First-Line Supervisors of Production and Operating Workers | Claude Opus 4.1 | 57.8% |
| Buyers and Purchasing Agents | Claude Opus 4.1 | 53.3% |
| Mechanical Engineers | GPT-5 | 25% |
| Industrial Engineers | Claude Opus 4.1 | 20% |
Professional, Scientific, and Technical Services
| Job | Model | AI Win Rate |
|---|---|---|
| Software Developers | GPT-5 | 75% |
| Computer and Information System Managers | GPT-5 | 53.3% |
| Project Management Specialists | Claude Opus 4.1 | 37.8% |
| Lawyers | GPT-5 | 25% |
| Accountants and Auditors | Claude Opus 4.1 | 13.3% |
Real Estate, Rental, and Leasing
| Job | Model | AI Win Rate |
|---|---|---|
| Counter and Rental Clerks | Claude Opus 4.1 | 82.2% |
| Real Estate Brokers | GPT-5 | 65% |
| Real Estate Sales Agents | Claude Opus 4.1 | 35.6% |
| Property, Real Estate, and Community Association Managers | Claude Opus 4.1 | 35.6% |
| Concierges | GPT-5 | 25% |
Retail Trade
| Job | Model | AI Win Rate |
|---|---|---|
| General and Operations Managers | GPT-5 | 70% |
| Private Detectives and Investigators | Claude Opus 4.1 | 68.9% |
| First-Line Supervisors of Retail Store Workers | Claude Opus 4.1 | 44.4% |
| Pharmacists | Claude Opus 4.1 | 31.1% |
Wholesale Trade
| Job | Model | AI Win Rate |
|---|---|---|
| Sales Managers | GPT-5 | 75% |
| First-Line Supervisors of Non-Retail Sales Workers | Claude Opus 4.1 | 55.6% |
| Sales Representatives, Wholesale and Manufacturing, Except Technical and Scientific Products | Claude Opus 4.1 | 55.6% |
| Sales Representatives, Wholesale and Manufacturing, Technical and Scientific Products | Claude Opus 4.1 | 42.2% |
| Order Clerks | Claude Opus 4.1 | 28.9% |

Leave a Reply