Facebook

Meta Beats Copyright Suit From Authors Over AI Training on Books (bloomberglaw.com) 38

An anonymous reader shares a report: Meta escaped a first-of-its-kind copyright lawsuit from a group of authors who alleged the tech giant hoovered up millions of copyrighted books without permission to train its generative AI model called Llama.

San Francisco federal Judge Vince Chhabria ruled Wednesday that Meta's decision to use the books for training is protected under copyright law's fair use defense, but he cautioned that his opinion is more a reflection on the authors' failure to litigate the case effectively. "This ruling does not stand for the proposition that Meta's use of copyrighted materials to train its language models is lawful," Chhabria said.

Microsoft

Microsoft Sued By Authors Over Use of Books in AI Training (reuters.com) 9

Microsoft has been hit with a lawsuit by a group of authors who claim the company used their books without permission to train its Megatron artificial intelligence model. From a report: Kai Bird, Jia Tolentino, Daniel Okrent and several others alleged that Microsoft used pirated digital versions of their books to teach its AI to respond to human prompts. Their lawsuit, filed in New York federal court on Tuesday, is one of several high-stakes cases brought by authors, news outlets and other copyright holders against tech companies including Meta Platforms, Anthropic and Microsoft-backed OpenAI over alleged misuse of their material in AI training.

[...] The writers alleged in the complaint that Microsoft used a collection of nearly 200,000 pirated books to train Megatron, an algorithm that gives text responses to user prompts.

AI

Anthropic Bags Key 'Fair Use' Win For AI Platforms, But Faces Trial Over Damages For Millions of Pirated Works (aifray.com) 90

A federal judge has ruled that Anthropic's use of copyrighted books to train its Claude AI models constitutes fair use, but rejected the startup's defense for downloading millions of pirated books to build a permanent digital library.

U.S. District Judge William Alsup granted partial summary judgment to Anthropic in the copyright lawsuit filed by authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson. The court found that training large language models on copyrighted works was "exceedingly transformative" under Section 107 of the Copyright Act. Anthropic downloaded over seven million books from pirate sites, according to court documents. The startup also purchased millions of print books, destroyed the bindings, scanned every page, and stored them digitally.

Both sets of books were used to train various versions of Claude, which generates over $1 billion in annual revenue. While the judge approved using books for AI training purposes, he ruled that downloading pirated copies to create what Anthropic called a "central library of all the books in the world" was not protected fair use. The case will proceed to trial on damages related to the pirated library copies.
AI

What if Customers Started Saying No to AI? (msn.com) 213

An artist cancelled their Duolingo and Audible subscriptions to protest the companies' decisions to use more AI. "If enough people leave, hopefully they kind of rethink this," the artist tells the Washington Post.

And apparently, many more people feel the same way... In thousands of comments and posts about Audible and Duolingo that The Post reviewed across social media — including on Reddit, YouTube, Threads and TikTok — people threatened to cancel subscriptions, voiced concern for human translators and narrators, and said AI creates inferior experiences. "It destroys the purpose of humanity. We have so many amazing abilities to create art and music and just appreciate what's around us," said Kayla Ellsworth, a 21-year-old college student. "Some of the things that are the most important to us are being replaced by things that are not real...."

People in creative jobs are already on edge about the role AI is playing in their fields. On sites such as Etsy, clearly AI-generated art and other products are pushing out some original crafters who make a living on their creations. AI is being used to write romance novels and coloring books, design logos and make presentations... "I was promised tech would make everything easier so I could enjoy life," author Brittany Moone said. "Now it's leaving me all the dishes and the laundry so AI can make the art."

But will this turn into a consumer movement? The article also cites an assistant marketing professor at Washington State University, who found customers are now reacting negatively to the term "AI" in product descriptions — out of fear for losing their jobs (as well as concerns about quality and privacy). And he does predict this can change the way companies use AI.

"There will be some companies that are going to differentiate themselves by saying no to AI." And while it could be a niche market, "The people will be willing to pay more for things just made by humans."
AI

Meta's Llama 3.1 Can Recall 42% of the First Harry Potter Book (understandingai.org) 85

Timothy B. Lee has written for the Washington Post, Vox.com, and Ars Technica — and now writes a Substack blog called "Understanding AI."

This week he visits recent research by computer scientists and legal scholars from Stanford, Cornell, and West Virginia University that found that Llama 3.1 70BÂ(released in July 2024) has memorized 42% of the first Harry Potter book well enough to reproduce 50-token excerpts at least half the time... The paper was published last month by a team of computer scientists and legal scholars from Stanford, Cornell, and West Virginia University. They studied whether five popular open-weight models — three from Meta and one each from Microsoft and EleutherAI — were able to reproduce text from Books3, a collection of books that is widely used to train LLMs. Many of the books are still under copyright... Llama 3.1 70B — a mid-sized model Meta released in July 2024 — is far more likely to reproduce Harry Potter text than any of the other four models....

Interestingly, Llama 1 65B, a similar-sized model released in February 2023, had memorized only 4.4 percent of Harry Potter and the Sorcerer's Stone. This suggests that despite the potential legal liability, Meta did not do much to prevent memorization as it trained Llama 3. At least for this book, the problem got much worse between Llama 1 and Llama 3. Harry Potter and the Sorcerer's Stone was one of dozens of books tested by the researchers. They found that Llama 3.1 70B was far more likely to reproduce popular books — such as The Hobbit and George Orwell's 1984 — than obscure ones. And for most books, Llama 3.1 70B memorized more than any of the other models...

For AI industry critics, the big takeaway is that — at least for some models and some books — memorization is not a fringe phenomenon. On the other hand, the study only found significant memorization of a few popular books. For example, the researchers found that Llama 3.1 70B only memorized 0.13 percent of Sandman Slim, a 2009 novel by author Richard Kadrey. That's a tiny fraction of the 42 percent figure for Harry Potter... To certify a class of plaintiffs, a court must find that the plaintiffs are in largely similar legal and factual situations. Divergent results like these could cast doubt on whether it makes sense to lump J.K. Rowling, Richard Kadrey, and thousands of other authors together in a single mass lawsuit. And that could work in Meta's favor, since most authors lack the resources to file individual lawsuits.

Why is it happening? "Maybe Meta had trouble finding 15 trillion distinct tokens, so it trained on the Books3 dataset multiple times. Or maybe Meta added third-party sources — such as online Harry Potter fan forums, consumer book reviews, or student book reports — that included quotes from Harry Potter and other popular books..."

"Or there could be another explanation entirely. Maybe Meta made subtle changes in its training recipe that accidentally worsened the memorization problem."
Medicine

The Medical Revolutions That Prevented Millions of Cancer Deaths (vox.com) 76

Vox publishes a story about "the quiet revolutions that have prevented millions of cancer deaths....

"The age-adjusted death rate in the US for cancer has declined by about a third since 1991, meaning people of a given age have about a third lower risk of dying from cancer than people of the same age more than three decades ago... " The dramatic bend in the curve of cancer deaths didn't happen by accident — it's the compound interest of three revolutions. While anti-smoking policy has been the single biggest lifesaver, other interventions have helped reduce people's cancer risk. One of the biggest successes is the HPV vaccine. A study last year found that death rates of cervical cancer — which can be caused by HPV infections — in US women ages 20-39 had dropped 62 percent from 2012 to 2021, thanks largely to the spread of the vaccine. Other cancers have been linked to infections, and there is strong research indicating that vaccination can have positive effects on reducing cancer incidence.

The next revolution is better and earlier screening. It's generally true that the earlier cancer is caught, the better the chances of survival... According to one study, incidences of late-stage colorectal cancer in Americans over 50 declined by a third between 2000 and 2010 in large part because rates of colonoscopies almost tripled in that same time period. And newer screening methods, often employing AI or using blood-based tests, could make preliminary screening simpler, less invasive and therefore more readily available. If 20th-century screening was about finding physical evidence of something wrong — the lump in the breast — 21st-century screening aims to find cancer before symptoms even arise.

Most exciting of all are frontier developments in treating cancer... From drugs like lenalidomide and bortezomib in the 2000s, which helped double median myeloma survival, to the spread of monoclonal antibodies, real breakthroughs in treatments have meaningfully extended people's lives — not just by months, but years. Perhaps the most promising development is CAR-T therapy, a form of immunotherapy. Rather than attempting to kill the cancer directly, immunotherapies turn a patient's own T-cells into guided missiles. In a recent study of 97 patients with multiple myeloma, many of whom were facing hospice care, a third of those who received CAR-T therapy had no detectable cancer five years later. It was the kind of result that doctors rarely see.

The article begins with some recent quotes from Jon Gluck, who was told after a cancer diagnosis that he had as little as 18 months left to live — 22 years ago...
AI

AI Firms Say They Can't Respect Copyright. But A Nonprofit's Researchers Just Built a Copyright-Respecting Dataset (msn.com) 100

Is copyrighted material a requirement for training AI? asks the Washington Post. That's what top AI companies are arguing, and "Few AI developers have tried the more ethical route — until now.

"A group of more than two dozen AI researchers have found that they could build a massive eight-terabyte dataset using only text that was openly licensed or in public domain. They tested the dataset quality by using it to train a 7 billion parameter language model, which performed about as well as comparable industry efforts, such as Llama 2-7B, which Meta released in 2023." A paper published Thursday detailing their effort also reveals that the process was painstaking, arduous and impossible to fully automate. The group built an AI model that is significantly smaller than the latest offered by OpenAI's ChatGPT or Google's Gemini, but their findings appear to represent the biggest, most transparent and rigorous effort yet to demonstrate a different way of building popular AI tools....

As it turns out, the task involves a lot of humans. That's because of the technical challenges of data not being formatted in a way that's machine readable, as well as the legal challenges of figuring out what license applies to which website, a daunting prospect when the industry is rife with improperly licensed data. "This isn't a thing where you can just scale up the resources that you have available" like access to more computer chips and a fancy web scraper, said Stella Biderman [executive director of the nonprofit research institute Eleuther AI]. "We use automated tools, but all of our stuff was manually annotated at the end of the day and checked by people. And that's just really hard."

Still, the group managed to unearth new datasets that can be used ethically. Those include a set of 130,000 English language books in the Library of Congress, which is nearly double the size of the popular-books dataset Project Gutenberg. The group's initiative also builds on recent efforts to develop more ethical, but still useful, datasets, such as FineWeb from Hugging Face, the open-source repository for machine learning... Still, Biderman remained skeptical that this approach could find enough content online to match the size of today's state-of-the-art models... Biderman said she didn't expect companies such as OpenAI and Anthropic to start adopting the same laborious process, but she hoped it would encourage them to at least rewind back to 2021 or 2022, when AI companies still shared a few sentences of information about what their models were trained on.

"Even partial transparency has a huge amount of social value and a moderate amount of scientific value," she said.

AI

Business Insider Recommended Nonexistent Books To Staff As It Leans Into AI (semafor.com) 23

An anonymous reader shares a report: Business Insider announced this week that it wants staff to better incorporate AI into its journalism. But less than a year ago, the company had to quietly apologize to some staff for accidentally recommending that they read books that did not appear to exist but instead may have been generated by AI.

In an email to staff last May, a senior editor at Business Insider sent around a list of what she called "Beacon Books," a list of memoirs and other acclaimed business nonfiction books, with the idea of ensuring staff understood some of the fundamental figures and writing powering good business journalism.

Many of the recommendations were well-known recent business, media, and tech nonfiction titles such as Too Big To Fail by Andrew Ross Sorkin, DisneyWar by James Stewart, and Super Pumped by Mike Isaac. But a few were unfamiliar to staff. Simply Target: A CEO's Lessons in a Turbulent Time and Transforming an Iconic Brand by former Target CEO Gregg Steinhafel was nowhere to be found. Neither was Jensen Huang: the Founder of Nvidia, which was supposedly published by the company Charles River Editors in 2019.

Education

Blue Book Sales Surge As Universities Combat AI Cheating (msn.com) 93

Sales of blue book exam booklets have surged dramatically across the nation as professors turn to analog solutions to prevent ChatGPT cheating. The University of California, Berkeley reported an 80% increase in blue book sales over the past two academic years, while Texas A&M saw 30% growth and the University of Florida recorded nearly 50% increases this school year. The surge comes as students who were freshmen when ChatGPT launched in 2022 approach senior year, having had access to AI throughout their college careers.
Television

Amazon Cancels the 'Wheel of Time' Prime Video Series After 3 Seasons (deadline.com) 101

Long-time Slashdot reader SchroedingersCat shares this article from Deadline: Prime Video will not be renewing The Wheel of Time for a fourth season according to Deadline article. The decision, which comes more than a month after the Season 3 finale was released April 17, followed lengthy deliberations. As often is the case in the current economic environment, the reasons were financial as the series is liked creatively by the streamer's executives...

The Season 3 overall performance was not strong enough compared to the show's cost for Prime Video to commit to another season and the streamer could not make it work after examining different scenarios and following discussions with lead studio Sony TV, sources said. With the cancellation possibility — and the show's passionate fanbase — in mind, the Season 3 finale was designed to offer some closure. Still, the news would be a gut punch for fans who have been praising the latest season as the series' best yet creatively... Prime Video and Sony TV will continue to back the Emmy campaign for The Wheel of Time's third season.

AI

Authors Are Accidentally Leaving AI Prompts In their Novels (404media.co) 60

Several romance novelists have accidentally left AI writing prompts embedded in their published books, exposing their use of chatbots, 404Media reports. Readers discovered passages like "Here's an enhanced version of your passage, making Elena more relatable" in K.C. Crowne's "Dark Obsession," for instance, and similar AI-generated instructions in works by Lena McDonald and Rania Faris.
Books

Usage of Semicolons In English Books Down Almost Half In Two Decades (theguardian.com) 122

An anonymous reader quotes a report from The Guardian: "Do not use semicolons," wrote Kurt Vonnegut, who averaged fewer than 30 a novel (about one every 10 pages). "All they do is show you've been to college." A study suggests UK authors are taking Vonnegut's advice to heart; the semicolon seems to be in terminal decline, with its usage in English books plummeting by almost half in two decades -- from one appearing in every 205 words in 2000 to one use in every 390 words today. Further research by Lisa McLendon, author of The Perfect English Grammar Workbook, found 67% of British students never or rarely use the semicolon. Just 11% of respondents described themselves as frequent users.

Linguistic experts at the language learning software Babbel, which commissioned the original research, were so struck by their findings that they asked McLendon to give the 500,000-strong London Student Network a 10-question multiple-choice quiz on the semicolon. She found more than half of respondents did not know or understand how to use it. As defined by the Oxford Dictionary of English, the semicolon is "a punctuation mark indicating a pause, typically between two main clauses, that is more pronounced than that indicated by a comma." It is commonly used to link together two independent but related clauses, and is particularly useful for juxtaposition or replacing confusing extra commas in lists where commas already exist -- or where a comma would create a splice.
The Guardian has a semicolon quiz at the end of the article where you can test your semicolon knowledge.
Books

Chicago Sun-Times Prints Summer Reading List Full of Fake Books (arstechnica.com) 65

An anonymous reader quotes a report from Ars Technica: On Sunday, the Chicago Sun-Times published an advertorial summer reading list containing at least 10 fake books attributed to real authors, according to multiple reports on social media. The newspaper's uncredited "Summer reading list for 2025" supplement recommended titles including "Tidewater Dreams" by Isabel Allende and "The Last Algorithm" by Andy Weir -- books that don't exist and were created out of thin air by an AI system. The creator of the list, Marco Buscaglia, confirmed to 404 Media (paywalled) that he used AI to generate the content. "I do use AI for background at times but always check out the material first. This time, I did not and I can't believe I missed it because it's so obvious. No excuses," Buscaglia said. "On me 100 percent and I'm completely embarrassed."

A check by Ars Technica shows that only five of the fifteen recommended books in the list actually exist, with the remainder being fabricated titles falsely attributed to well-known authors. [...] On Tuesday morning, the Chicago Sun-Times addressed the controversy on Bluesky. "We are looking into how this made it into print as we speak," the official publication account wrote. "It is not editorial content and was not created by, or approved by, the Sun-Times newsroom. We value your trust in our reporting and take this very seriously. More info will be provided soon." In the supplement, the books listed by authors Isabel Allende, Andy Weir, Brit Bennett, Taylor Jenkins Reid, Min Jin Lee, Percival Everett, Delia Owens, Rumaan Alam, Rebecca Makkai, and Maggie O'Farrell are confabulated, while books listed by authors Francoise Sagan, Ray Bradbury, Jess Walter, Andre Aciman, and Ian McEwan are real. All of the authors are real people.
"The Chicago Sun-Times obviously gets ChatGPT to write a 'summer reads' feature almost entirely made up of real authors but completely fake books. What are we coming to?" wrote novelist Rachael King.

A Reddit user also expressed disapproval of the incident. "As a subscriber, I am livid! What is the point of subscribing to a hard copy paper if they are just going to include AI slop too!? The Sun Times needs to answer for this, and there should be a reporter fired."
Windows

'The People Stuck Using Ancient Windows Computers' (bbc.com) 137

The BBC visits "the strange, stubborn world of obsolete Windows machines." Even if you're a diehard Apple user, you're probably interacting with Windows systems on a regular basis. When you're pulling cash out, for example, chances are you're using a computer that's downright geriatric by technology standards. (Microsoft declined to comment for this article.) "Many ATMs still operate on legacy Windows systems, including Windows XP and even Windows NT," which launched in 1993, says Elvis Montiero, an ATM field technician based in Newark, New Jersey in the US. "The challenge with upgrading these machines lies in the high costs associated with hardware compatibility, regulatory compliance and the need to rewrite proprietary ATM software," he says. Microsoft ended official support for Windows XP in 2014, but Montiero says many ATMs still rely on these primordial systems thanks to their reliability, stability and integration with banking infrastructure.
And a job listing for an IT systems administrator for Germany's railway service "were expected to have expertise with Windows 3.11 and MS-DOS — systems released 32 and 44 years ago, respectively. In certain parts of Germany, commuting depends on operating systems that are older than many passengers." It's not just German transit, either. The trains in San Francisco's Muni Metro light railway, for example, won't start up in the morning until someone sticks a floppy disk into the computer that loads DOS software on the railway's Automatic Train Control System (ATCS). Last year, the San Francisco Municipal Transit Authority (SFMTA) announced its plans to retire this system over the coming decade, but today the floppy disks live on.
Apple is "really aggressive about deprecating old products," M. Scott Ford, a software developer who specialises in updating legacy systems, tells the BBC. "But Microsoft took the approach of letting organisations leverage the hardware they already have and chasing them for software licenses instead. They also tend to have a really long window for supporting that software."

And so you get things like two enormous LightJet printers in San Diego powered by servers running Windows 2000, says photographic printer John Watts: Long out of production, the few remaining LightJets rely on the Windows operating systems that were around when these printers were sold. "A while back we looked into upgrading one of the computers to Windows Vista. By the time we added up the money it would take to buy new licenses for all the software it was going to cost $50,000 or $60,000 [£38,000 to £45,000]," Watts says. "I can't stand Windows machines," he says, "but I'm stuck with them...."

In some cases, however, old computers are a labour of love. In the US, Dene Grigar, director of the Electronic Literature Lab at Washington State University, Vancouver, spends her days in a room full of vintage (and fully functional) computers dating back to 1977... She's not just interested in early, experimental e-books. Her laboratory collects everything from video games to Instagram zines.... Grigar's Electronic Literature Lab maintains 61 computers to showcase the hundreds of electronic works and thousands of files in the collection, which she keeps in pristine condition.

Grigar says they're still looking for a PC that reads five-and-a-quarter-inch floppy disks.
AI

Is the Altruistic OpenAI Gone? (msn.com) 51

"The altruistic OpenAI is gone, if it ever existed," argues a new article in the Atlantic, based on interviews with more than 90 current and former employees, including executives. It notes that shortly before Altman's ouster (and rehiring) he was "seemingly trying to circumvent safety processes for expediency," with OpenAI co-founder/chief scientist Ilya telling three board members "I don't think Sam is the guy who should have the finger on the button for AGI." (The board had already discovered Altman "had not been forthcoming with them about a range of issues" including a breach in the Deployment Safety Board's protocols.)

Adapted from the upcoming book, Empire of AI, the article first revisits the summer of 2023, when Sutskever ("the brain behind the large language models that helped build ChatGPT") met with a group of new researchers: Sutskever had long believed that artificial general intelligence, or AGI, was inevitable — now, as things accelerated in the generative-AI industry, he believed AGI's arrival was imminent, according to Geoff Hinton, an AI pioneer who was his Ph.D. adviser and mentor, and another person familiar with Sutskever's thinking.... To people around him, Sutskever seemed consumed by thoughts of this impending civilizational transformation. What would the world look like when a supreme AGI emerged and surpassed humanity? And what responsibility did OpenAI have to ensure an end state of extraordinary prosperity, not extraordinary suffering?

By then, Sutskever, who had previously dedicated most of his time to advancing AI capabilities, had started to focus half of his time on AI safety. He appeared to people around him as both boomer and doomer: more excited and afraid than ever before of what was to come. That day, during the meeting with the new researchers, he laid out a plan. "Once we all get into the bunker — " he began, according to a researcher who was present.

"I'm sorry," the researcher interrupted, "the bunker?"

"We're definitely going to build a bunker before we release AGI," Sutskever replied. Such a powerful technology would surely become an object of intense desire for governments globally. The core scientists working on the technology would need to be protected. "Of course," he added, "it's going to be optional whether you want to get into the bunker." Two other sources I spoke with confirmed that Sutskever commonly mentioned such a bunker. "There is a group of people — Ilya being one of them — who believe that building AGI will bring about a rapture," the researcher told me. "Literally, a rapture...."

But by the middle of 2023 — around the time he began speaking more regularly about the idea of a bunker — Sutskever was no longer just preoccupied by the possible cataclysmic shifts of AGI and superintelligence, according to sources familiar with his thinking. He was consumed by another anxiety: the erosion of his faith that OpenAI could even keep up its technical advancements to reach AGI, or bear that responsibility with Altman as its leader. Sutskever felt Altman's pattern of behavior was undermining the two pillars of OpenAI's mission, the sources said: It was slowing down research progress and eroding any chance at making sound AI-safety decisions.

"For a brief moment, OpenAI's future was an open question. It might have taken a path away from aggressive commercialization and Altman. But this is not what happened," the article concludes. Instead there was "a lack of clarity from the board about their reasons for firing Altman." There was fear about a failure to realize their potential (and some employees feared losing a chance to sell millions of dollars' worth of their equity).

"Faced with the possibility of OpenAI falling apart, Sutskever's resolve immediately started to crack... He began to plead with his fellow board members to reconsider their position on Altman." And in the end "Altman would come back; there was no other way to save OpenAI." To me, the drama highlighted one of the most urgent questions of our generation: How do we govern artificial intelligence? With AI on track to rewire a great many other crucial functions in society, that question is really asking: How do we ensure that we'll make our future better, not worse? The events of November 2023 illustrated in the clearest terms just how much a power struggle among a tiny handful of Silicon Valley elites is currently shaping the future of this technology. And the scorecard of this centralized approach to AI development is deeply troubling. OpenAI today has become everything that it said it would not be....
The author believes OpenAI "has grown ever more secretive, not only cutting off access to its own research but shifting norms across the industry to no longer share meaningful technical details about AI models..."

"At the same time, more and more doubts have risen about the true economic value of generative AI, including a growing body of studies that have shown that the technology is not translating into productivity gains for most workers, while it's also eroding their critical thinking."

Slashdot Top Deals