AI Agents Struggle to Succeed as Freelancers, Study Finds

This article was generated by AI and cites original sources.

A recent study conducted by researchers at data annotation company Scale AI and the Center for AI Safety (CAIS) found that even the most advanced artificial intelligence (AI) agents struggle to perform online freelance work effectively. The Remote Labor Index, a new benchmark designed to evaluate the ability of cutting-edge AI models to automate economically valuable tasks, showed that leading AI agents could only complete less than 3 percent of the assigned work, earning a fraction of the potential income.

The experiment highlighted the challenges AI faces in replacing human workers in the freelance market. Among the AI models tested, Manus from a Chinese startup emerged as the most capable, followed by Grok from xAI, Claude from Anthropic, ChatGPT from OpenAI, and Gemini from Google.

Despite recent advancements in AI technology, the study’s findings underscore the significant gap between AI capabilities and the complex demands of freelance tasks. Dan Hendrycks, director of CAIS, emphasized the importance of realistic assessments of AI capabilities, cautioning against overestimating the current progress in AI development.

While AI models have shown improvements in certain domains like coding and mathematics, the study reveals the current limitations of AI in handling diverse and complex freelance assignments.

Source: WIRED