Zuckerberg Appeared to Know Meta Trained AI on Pirated Library

Newly unsealed documents show how Meta used LibGen, a pirated library of ebooks, to train its Llama 3 chatbot.
Zuckerberg Appeared to Know Meta Trained AI on Pirated Library

Click Here to read in detail


The AI rush has brought with it thorny questions of copyright and ownership of data as tech companies train bots like ChatGPT on existing texts, but it seems Meta largely brushed these aside as they worked to integrate such tools into Facebook and Instagram.

As first revealed in a motion filed by attorneys for novelists Christopher Golden and Richard Kadrey and comedian Sarah Silverman, who are pursuing a class-action suit against Meta for allegedly using their copyrighted work without permission, employees at the tech giant had candid conversations about the potential for scandal that would arise from leveraging a risky resource: Library Genesis, or LibGen, a massive so-called “shadow library” of free downloadable ebooks and PDFs that includes otherwise paywalled research and academic articles. In these exchanges, Meta’s engineers identified LibGen as “adataset we know to be pirated,” but indicated that CEO Mark Zuckerberg had approved its use for training the next iteration of its large language model, Llama.

Now, under a court order from Judge Vince Chhabria of the U.S. District Court for the Northern District of California, the records of those previously confidential internal dialogues have been unsealed, and appear to confirm Zuckerberg’s decision to greenlight the transfer of pirated, copyrighted LibGen data to improve Llama — despite concerns about a backlash. In an email to Joelle Pineau, vice president of AI research at Meta, Sony Theakanath, director of product management, wrote, “After a prior escalation to MZ [Mark Zuckerberg], GenAI has been approved to use LibGen for Llama 3 […] with a number of agreed upon mitigations.” The note observed that including the LibGen material would help them reach certain performance benchmarks, and alluded to industry rumors that other AI companies, including OpenAI and Mistral AI, are “using the library for their models.” In the same email, Theakanath wrote that under no circumstances would Meta publicly disclose its use of LibGen.

The same email lays out the legal exposures and potential negative media attention that could follow if “external parties” deduce that the LibGen trove formed part of Llama’s training data: “Copyright and IP is top of mind for legislators around the world, including in the US and EU,” the document states. “US legislators expressed concern in a recent hearing about AI developers using pirated websites for training. It’s unclear what their legislative actions would be if the concern spreads, but it reflects some of the negative lobbying right holders have been doing, related to our litigation on this topic (along the lines that this is ‘stolen’ content that then taints the output of this model).”

Meta did not immediately return a request for comment on these internal communications.

Elsewhere in the unsealed documents, Meta employees describe methods for processing and filtering text from LibGen in order to remove “boilerplate” indications of copyright, such as “ISBN,” “Copyright,” “©,” and “All rights reserved.” The author of a memo titled “Observations on LibGen-SciMag” (“SciMag” is the library’s catalogue of science journals) reports that the material’s “quality is high and the documents are long so this should be great data to learn from, in particular, for highly specialized knowledge!” The same memo recommends trying to “remove more copyright headers and document identifiers” — seemingly more evidence that Meta was looking to cover its tracks as it exploited this cache of technical text that it did not have permission to use.

Other revealing messages show Meta’s AI research team and executives discussing best methods for obtaining the LibGen data set besides directly torrenting it, or downloading via peer-to-peer file sharing, from the company’s IP addresses. At some points, employees wondered if this was even allowed. “I think torrenting from a corporate laptop doesn’t feel right,” wrote one engineer in April 2023, adding a smiley face emoji. (A later email acknowledged that the “SciMag” data had indeed been torrented.) And in October 2023 messages to a researcher working on Llama, Ahmad Al-Dahle, vice president of GenAI at Meta, said he had “cleared the path to use” LibGen and was “pushing from the top” to incorporate other data sets to improve Llama and win the AI race.

It’s no wonder Meta fought the unsealing and unredacting of these discussions as the discovery period in the copyright lawsuit came to an end: they seem to damage the company’s argument that “using text to statistically model language and generate original expression” falls under the legal rubric of fair use, or the permissible limited use of copyrighted material without permission, as its lawyers put it in a motion to dismiss the suit. The plaintiffs’ attorneys, moreover, recorded in their latest filing that Zuckerberg himself in a recent deposition said that the kind of piracy described in their latest amended complaint would raise “lots of red flags” and “seems like a bad thing.”

Of course, Meta, which Tuesday announced it will be cutting the 5 percent of its workforce deemed its “lowest performers,” or some 3,600 workers, is hardly alone as a Silicon Valley behemoth accused of flouting (or circumventing) copyright law. This class action could prove a bellwether for the many other suits in progress against AI companies regarding the ownership of photographs, art, music, journalism, books, and more. But as long as tech firms are hungrily searching for more stuff for its bots to replicate and remix, they will always be reliant on the original content creators: human beings.



Miatamil

Listed here the latest Trending News

Porridge Radio say goodbye with single 'Don't Want To Dance' from new EP that

Porridge Radio say goodbye with single 'Don't Want To Dance' from new EP that "marks the end of the band"

Porridge Radio have announced their split and shared their goodbye single 'Don't Want To Dance' from their forthcoming final EP. 

Read more >> : Cick here

Share on : 👇
Twitter (X) Facebook truthsocial gettr pinterest whatsapp telegram
WWE Raw Producers For This Week’s Show Revealed (1/13/2025) PWMania Wrestling News

WWE Raw Producers For This Week’s Show Revealed (1/13/2025) PWMania Wrestling News

Featured below are the producers who worked matches and segments at the WWE Raw taping in San Jose, CA. on Monday night, January 13: * WWE Main Event: Kayden & Katana vs. Pure Fusion Collective: Molly Holly* WWE Main Event: Joaquin Wilde vs. Pete Dunne* Chad Gable vs. Penta: Jamie Noble* Sheamus vs. Ludwig Kaiser:

Read more >> : Cick here

Share on : 👇
Twitter (X) Facebook truthsocial gettr pinterest whatsapp telegram
WGA Awards Nominations Revealed

WGA Awards Nominations Revealed

Anora, Challengers, A Complete Unknown, Nickel Boys and Wicked are among the movie screenplay nominees for the 77th annual Writers Guild Awards.

Read more >> : Cick here

Share on : 👇
Twitter (X) Facebook truthsocial gettr pinterest whatsapp telegram
Gogglebox star details 'traumatic' split from husband and co-star as he admits 'didn't see it coming'

Gogglebox star details 'traumatic' split from husband and co-star as he admits 'didn't see it coming'

The former Gogglebox favourite admitted his separation from Stephen Webb last year had been 'awful'

Read more >> : Cick here

Share on : 👇
Twitter (X) Facebook truthsocial gettr pinterest whatsapp telegram
Oscar bosses REFUSING to cancel Hollywood's show amid LA fires

Oscar bosses REFUSING to cancel Hollywood's show amid LA fires

2025 Oscar bosses are refusing to cancel Hollywood's biggest night amid the ongoing LA fires - despite a report claiming the show could be axed.

Read more >> : Cick here

Share on : 👇
Twitter (X) Facebook truthsocial gettr pinterest whatsapp telegram
The one actor Liam Neeson modelled his entire career on

The one actor Liam Neeson modelled his entire career on

Liam Neeson quickly figured out that trying to emulate one of the biggest stars in Hollywood was a surefire way of building a successful career.

Read more >> : Cick here

Share on : 👇
Twitter (X) Facebook truthsocial gettr pinterest whatsapp telegram
Conor McGregor Accused of Sexual Battery in New Federal Lawsuit

Conor McGregor Accused of Sexual Battery in New Federal Lawsuit

Conor McGregor is facing a new lawsuit in Florida federal court that accuses him of sexually assaulting a woman at a Miami Heat game in 2023.

Read more >> : Cick here

Share on : 👇
Twitter (X) Facebook truthsocial gettr pinterest whatsapp telegram
2025 Oscar season: Festiveness snuffed out with heart of movie industry threatened

2025 Oscar season: Festiveness snuffed out with heart of movie industry threatened

Not for the first time this decade, the Oscars are facing the question of: Should the show go on? And if it does, what do they mean now?

Read more >> : Cick here

Share on : 👇
Twitter (X) Facebook truthsocial gettr pinterest whatsapp telegram
John Deere faces U.S. lawsuit over farmers' ability to repair tractors

John Deere faces U.S. lawsuit over farmers' ability to repair tractors

The Federal Trade Commission and two states accuse Deere of costing farmers time and money by unfairly limiting software access and forcing the use of only authorized dealers.

Read more >> : Cick here

Share on : 👇
Twitter (X) Facebook truthsocial gettr pinterest whatsapp telegram
‘Today’ host Sheinelle Jones addresses lengthy absence from show

‘Today’ host Sheinelle Jones addresses lengthy absence from show

Sheinelle Jones has finally addressed her absence from “Today,” nearly a month after she disappeared from television screens without explanation.

Read more >> : Cick here

Share on : 👇
Twitter (X) Facebook truthsocial gettr pinterest whatsapp telegram


These hashtags listed here are the most popular shared hashtags on Worldwide


Twitter (X), Inc. was an American social media company based in San Francisco, California, which operated and was named for its flagship social media network prior to its rebrand as X. In addition to Twitter, the company previously operated the Vine short video app and Periscope livestreaming service

Twitter (X) is one of the most popular social media platforms, with over 619 million monthly active users worldwide. One of the most exciting features of Twitter (X) is the ability to see what topics are trending in real-time. Twitter trends are a fascinating way to stay up to date on what people are talking about on the platform, and they can also be a valuable tool for businesses and individuals to stay relevant and informed. In this article, we will discuss Twitter (X) trends, how they work, and how you can use them to your advantage.

What are Twitter (X) Worldwide Trends?
Twitter (X) Worldwide trends are a list of topics that are currently being talked about on the platform and also world. The topics on this list change in real-time and are based on the volume of tweets using a particular hashtag or keyword. Twitter (X) Worldwide trends can be localized to a Worldwide country or region or can be global, depending on the topic's popularity.

How Do Twitter (X) Worldwide Trends Work?
Twitter (X) Worldwide trends are generated by an algorithm that analyzes the volume of tweets using a particular hashtag or keyword. When the algorithm detects a sudden increase in tweets using a specific hashtag or keyword, it considers that topic to be trending.

Once a topic is identified as trending, it is added to the list of Twitter (X) Worldwide trends. The topics on this list are ranked based on their popularity, with the most popular topics appearing at the top of the list.

Twitter (X) Worldwide trends can be filtered by location or category, allowing users to see what topics are trending in their area or in a particular industry. Additionally, users can click on a trending topic to see all of the tweets using that hashtag or keyword.