A report from trade association News Media Alliance on Tuesday claimed that artificial intelligence chatbots respond to users’ questions with “paraphrases or outright repetitions of copied” news articles and other copyrighted works New York Post reported.
The NMA is a nonprofit organization representing more than 2,200 news organizations worldwide.
The 77-page report explained that large language models, including Microsoft’s Bing Chat, Google’s Bard and OpenAI’s ChatGPT, are trained on copyrighted news articles and, in some cases, repeat those parts word for word in their responses to queries.
“As with previous ‘disruptive’ Silicon Valley models, [generative artificial intelligence] Investors rely on forgiveness rather than asking for permission. “They rely on the claim that copying for educational purposes is a ‘fair use’ which they are allowed to continue with impunity, even though many of their products compete directly with publishers and endanger their continued welfare,” the NMA report said. “But fair.” Usage doesn’t work that way.”
The NMA’s white paper states that LLMs are trained by copying large amounts of others’ creative works. This often happens without permission or compensation, the organization added. Additionally, the nonprofit’s analysis of AI systems showed that developers disproportionately train their LLMs using online news articles and similar content.
“In fact, our analysis of a representative sample of news, magazine and digital media publications shows that the popular curated datasets underlying some of the most widely used LLMs significantly overweight publisher content, by a factor of over 5 to nearly 100 im Compared to “the generic collection of content that the well-known company Common Crawl has pulled from the Internet,” the report says.
Half of the top ten websites on which Google’s Bard is trained are news organizations, it said.
While GAI developers claim that their LLMs “only ‘learn’ unprotectable facts from copyrighted training materials,” the NMA says this is “technically inaccurate” because the systems retain the facts without actually understanding the underlying concepts. Furthermore, the NMA argued: “This is beside the point because materials used for ‘learning’ are subject to copyright.” The report noted that libraries must legally purchase a book before lending it to individuals, which does not grant the borrower the right to copy the book material.
The NMA made several recommendations to GAI developers, including notifying publishers if their content was copied and used to train AI chatbots. Developers who use content without permission should be recognized as violating publishers’ rights, the report said.
The organization called on lawmakers and the United States Copyright Office to promote content licensing for publishers. The NMA said it “supports the passage of its proposed legislation that would allow news publishers to bargain collectively with certain dominant technology providers.”
Google, OpenAI and Microsoft did not respond to the post’s request for comment.
Do you like Blaze News? Bypass the censorship, sign up for our newsletter and get stories like this delivered straight to your inbox. Login here!