OpenAI CTO Mira Murati expressed uncertainty regarding whether Sora's training data includes YouTube content, stating that the model primarily draws from publicly available and licensed sources. According to a New York Times report, OpenAI has transcribed over a million hours of YouTube videos for use in training Sora.
When questioned by CNBC about potential violations of Google's terms and conditions, Pichai deferred, emphasizing that it is up to OpenAI to address such concerns and adhere to clear terms of service. OpenAI faces mounting scrutiny and legal challenges regarding its data usage practices.
The New York Times has filed a lawsuit against the AI startup, alleging copyright infringement for training its models on the Times' content without proper authorization. Additionally, the Authors Guild has initiated legal action, asserting that OpenAI's language models rely heavily on copyrighted material without adequate compensation or recognition for creators.
These lawsuits underscore broader concerns about OpenAI's data sourcing practices. While the company requires substantial data to maintain the effectiveness of its AI models, critics argue that it often utilizes copyrighted material without proper permissions, leading to legal disputes and questions about intellectual property rights.
The Authors Guild's lawsuit, in particular, highlights the scale of OpenAI's data usage, alleging that the company's models incorporate millions of copyrighted articles, books, and other creative works without appropriate compensation or attribution to the original creators. As these legal battles unfold, the outcome could have significant implications for how AI developers navigate data usage and intellectual property
. Read more on livemint.com