OpenAI Taps Contractors to Benchmark AI Performance Through Real-World Tasks

This article was generated by AI and cites original sources.

OpenAI, a prominent player in the AI domain, is leveraging real-world tasks from contractors to refine the capabilities of its AI models. In a move aimed at evaluating AI performance, OpenAI is soliciting actual assignments and projects from third-party contractors, as revealed by documents from OpenAI and Handshake AI obtained by WIRED.

The initiative is designed to set a human benchmark for diverse tasks, allowing for a comparative analysis with AI models. Earlier, OpenAI initiated an evaluation process to assess AI model performance against human professionals in various sectors, marking a significant step towards achieving Artificial General Intelligence (AGI).

Contractors are instructed to outline and upload concrete examples of tasks they have undertaken, emphasizing tangible outputs like Word documents, PDFs, images, and more. Additionally, contractors can provide fabricated work examples to showcase their problem-solving approach in specific scenarios.

Real-world tasks encompass two key elements: the task request and the task deliverable, highlighting the instructions received and the work produced in response. While both OpenAI and Handshake AI have refrained from commenting on this development, the endeavor underscores a strategic move to enhance AI proficiency through practical job-related tasks.

Source: WIRED