OpenAI Faces Legal Scrutiny Over Deletion of Allegedly Pirated Book Datasets

This article was generated by AI and cites original sources.

OpenAI, a prominent player in the AI landscape, is facing legal pressure following the deletion of book datasets that have sparked controversy. The datasets, known as ‘Books 1’ and ‘Books 2,’ were removed before the release of ChatGPT in 2022. These datasets, allegedly sourced from Library Genesis (LibGen), have put OpenAI in the crosshairs of a class-action lawsuit from authors who claim their works were used without permission.

While OpenAI initially cited ‘non-use’ as a rationale for deleting the datasets, subsequent legal developments have raised questions about the true motives behind this action. Authors have pushed for transparency, leading to a court order for OpenAI to disclose internal communications related to the dataset deletion, including discussions with in-house lawyers and references to LibGen that were previously withheld under attorney-client privilege.

This legal saga underscores the complexities of data ethics and intellectual property rights in the realm of artificial intelligence. As AI models become more sophisticated and data-intensive, ensuring ethical sourcing and usage of datasets is paramount to prevent legal entanglements and safeguard intellectual property.

Source: Ars Technica