×
Piracy

Z-Library Helps Students to Overcome Academic Poverty, Study Finds (torrentfreak.com) 16

A new study reveals that many users, particularly students and Redditors, view Z-Library as a vital resource for overcoming economic barriers to education, reflecting a "Robin Hood" mentality that prioritizes access to knowledge over copyright concerns. TorrentFreak reports: The research looks at the motivations of two groups; Reddit users and Chinese postgraduate students. Despite the vast differences between these groups, their views on Z-Library are quite similar. The 134 Reddit responses were sampled from the Zlibrary subreddit, which is obviously biased in favor of the site. However, the reasoning goes well beyond a simple "I want free stuff" arguments. Many commenters highlighted that they were drawn to the site out of poverty, for example, or they highlighted that Z-Library was an essential tool to fulfill their academic goals.

"Living in a 3rd world country, 1 book would cost like 50%- 80% already of my daily wage," one Redditor wrote. The idea that Z-Library is a 'necessary evil' was also highlighted by other commenters. This includes a student who can barely make ends meet, and a homeless person, who has neither the money nor the space for physical books. The lack of free access to all study materials, including academic journal subscriptions at university libraries, was also a key motivator. Paired with the notion that journal publishers make billions of dollars, without compensating authors, justification is found for 'pirate' alternatives. "They make massive profits. So stealing from them doesn't hurt the authors nor reviewers, just the rich greedy publishers who make millions just to design a cover and click 'publish'," one Redditor wrote.

The second part of the study is conducted in a more structured format among 103 postgraduate students in China. This group joined a seminar where Z-Library and the crackdown were discussed. In addition, the students participated in follow-up focus group discussions, while also completing a survey. Despite not all being users of the shadow library, 41% of the students agreed that the site's (temporary) shutdown affected their ability to study and find resources for degree learning. In general, the students have a favorable view toward Z-Library and similar sites, and 71% admit that they have used a shadow library in the past. In line with China's socialist values, the overwhelming majority of the students agreed that access to knowledge should be free for everyone. While the students are aware of copyright law, they believe that the need to access knowledge outweighs rightsholders' concerns. This is also reflected in the following responses, among others. All in all, Z-Library and other shadow libraries are seen as a viable option for expensive or inaccessible books, despite potential copyright concerns.
The paper has been published in the Journal of University Teaching & Learning Practice.
AI

HarperCollins Confirms It Has a Deal to Sell Authors' Work to AI Company 36

HarperCollins has partnered with an AI technology company to allow limited use of select nonfiction backlist titles for training AI models, offering authors the choice to opt in for a $2,500 non-negotiable fee. 404 Media reports: On Friday, author Daniel Kibblesmith, who wrote the children's book Santa's Husband and published it with HarperCollins, posted screenshots on Bluesky of an email he received, seemingly from his agent, informing him that the agency was approached by the publisher about the AI deal. "Let me know what you think, positive or negative, and we can handle the rest of this for you," the screenshotted text in an email to Kibblesmith says. The screenshots show the agent telling Kibblesmith that HarperCollins was offering $2,500 (non-negotiable).

"You are receiving this memo because we have been informed by HarperCollins that they would like permission to include your book in an overall deal that they are making with a large tech company to use a broad swath of nonfiction books for the purpose of providing content for the training of an Al language learning model," the screenshots say. "You are likely aware, as we all are, that there are controversies surrounding the use of copyrighted material in the training of Al models. Much of the controversy comes from the fact that many companies seem to be doing so without acknowledging or compensating the original creators. And of course there is concern that these Al models may one day make us all obsolete."
Kibblesmith called the deal "abominable."

"It seems like they think they're cooked, and they're chasing short money while they can. I disagree," Kibblesmith told the AV Club. "The fear of robots replacing authors is a false binary. I see it as the beginning of two diverging markets, readers who want to connect with other humans across time and space, or readers who are satisfied with a customized on-demand content pellet fed to them by the big computer so they never have to be challenged again."
AI

AI Lab PleIAs Releases Fully Open Dataset, as AMD, Ai2 Release Open AI Models (huggingface.co) 5

French private AI lab PleIAs "is committed to training LLMs in the open," they write in a blog post at Mozilla.org. "This means not only releasing our models but also being open about every aspect, from the training data to the training code. We define 'open' strictly: all data must be both accessible and under permissive licenses."

Wednesday PleIAs announced they were releasing the largest open multilingual pretraining dataset, according to their blog post at HuggingFace: Many have claimed that training large language models requires copyrighted data, making truly open AI development impossible. Today, Pleias is proving otherwise with the release of Common Corpus (part of the AI Alliance Open Trusted Data Initiative) — the largest fully open multilingual dataset for training LLMs, containing over 2 trillion tokens of permissibly licensed content with provenance information (2,003,039,184,047 tokens).

As developers are responding to pressures from new regulations like the EU AI Act, Common Corpus goes beyond compliance by making our entire permissibly licensed dataset freely available on HuggingFace, with detailed documentation of every data source. We have taken extensive steps to ensure that the dataset is high-quality and is curated to train powerful models. Through this release, we are demonstrating that there doesn't have to be such a [heavy] trade-off between openness and performance.

Common Corpus is:

— Truly Open: contains only data that is permissively licensed and provenance is documented

— Multilingual: mostly representing English and French data, but contains at least 1B tokens for over 30 languages

— Diverse: consisting of scientific articles, government and legal documents, code, and cultural heritage data, including books and newspapers

— Extensively Curated: spelling and formatting has been corrected from digitized texts, harmful and toxic content has been removed, and content with low educational content has also been removed.


Common corpus builds on a growing ecosystem of large, open datasets, such as Dolma, FineWeb, RefinedWeb. The Common Pile currently in preparation under the coordination of Eleuther is built around the same principle of using permissible content in English language and, unsurprisingly, there were many opportunities for collaborations and shared efforts. But even together, these datasets do not provide enough training data for models much larger than a few billion parameters. So in order to expand the options for open model training, we still need more open data...

Based on an analysis of 1 million user interactions with ChatGPT, the plurality of user requests are for creative compositions... The kind of content we actually need — like creative writing — is usually tied up in copyright restrictions. Common Corpus tackles these challenges through five carefully curated collections...

Last week AMD also released its first series of fully open 1 billion parameter language models, AMD OLMo.

And last month VentureBeat reported that the non-profit Allen Institute for AI had unveiled Molmo, "an open-source family of state-of-the-art multimodal AI models which outpeform top proprietary rivals including OpenAI's GPT-4o, Anthropic's Claude 3.5 Sonnet, and Google's Gemini 1.5 on several third-party benchmarks."
Patents

Open Source Fights Back: 'We Won't Get Patent-Trolled Again' (zdnet.com) 62

ZDNet's Steven Vaughan-Nichols reports: [...] At KubeCon North America 2024 this week, CNCF executive director Priyanka Sharma said in her keynote, "Patent trolls are not contributors or even adopters in our ecosystem. Instead, they prey on cloud-native adopters by abusing the legal system. We are here to tell the world that these patent trolls don't stand a chance because CNCF is uniting the ecosystem to deter them. Like a herd of musk oxen, we will run them off our pasture." CNCF CTO Chris Aniszczyk added: "The reason trolls can make money is that many companies find it too expensive to fight back, so they pay trolls a settlement fee to avoid the even higher cost of litigation. Now, when a whole herd of companies band together like musk oxen to drive a troll off, it changes the cost structure of fighting back. It disrupts their economic model."

How? Jim Zemlin, the Linux Foundation's executive director, said, "We don't negotiate with trolls. Instead, with United Patents, we go to the PTO and crush those patents. We strive to invalidate them by working with developers who have prior art, bringing this to the attention of the USPTO, and killing patents. No negotiation, no settlement. We destroy the very asset that made patent trolls' business work. Together, since we've started this effort, 90% of the time, we've been able to go in there and destroy these patents." "It's time for us to band together," said Joanna Lee, CNCF's VP of strategic programs and legal. "We encourage all organizations in our ecosystem to get involved. Join the fight, enhance your own company's protection, protect your customers, enhance our community defense, and save money on legal expenses."

While getting your company and its legal department involved in the effort to fend off patent trolls is important, developers can also help. CNCF announced the Cloud Native Heroes Challenge, a patent troll bounty program in which cloud-native developers and technologists can earn swag and win prizes. They're asking you to find evidence of preexisting technology -- referred to by patent lawyers as "prior art" -- that can kill off bad patents. This could be open-source documentation (including release notes), published standards or specifications, product manuals, articles, blogs, books, or any publicly available information. All entrants who submit an entry that conforms to the contest rules will receive a free "Cloud Native Hero" t-shirt that can be picked up at any future KubeCon+CloudNativeCon. The winner will also receive a $3,000 cash prize.

In the inaugural contest, the CNCF is seeking information that can be used to invalidate Claim 1 from US Patent US-11695823-B1. This is the major patent asserted by Edge Networking Systems against Kubernetes users. As is often the case with such patents, it's much too broad. This patent describes a network architecture that facilitates secure and flexible programmability between a user device and across a network with full lifecycle management of services and infrastructure applications. That describes pretty much any modern cloud system. If you can find prior art that describes such a system before June 13, 2013, you could be a winner. Some such materials have already been found. This is already listed in the "known references" tab of the contest information page and doesn't qualify. If you care about keeping open-source software easy and cheap to use -- or you believe trolls shouldn't be allowed to take advantage of companies that make or use programs -- you can help. I'll be doing some digging myself.

AI

Dutch Publisher's AI Translation Plan Sparks Industry Backlash (theguardian.com) 38

Dutch publisher Veen Bosch & Keuning has announced plans to use AI for translating commercial fiction, drawing sharp criticism from literary professionals despite promises of human oversight and author consent.

Award-winning translator Michele Hutchison, who won the 2020 International Booker Prize, argues that translation extends beyond word conversion. "We build bridges between cultures, taking into account the target readership every step of the way," she said, noting that translators convey rhythm, poetry, and cultural nuances while conducting precise terminology research.
Books

Are America's Courts Going After Digital Libraries? (reason.com) 43

A new article at Reason.com argues that U.S. courts "are coming for digital libraries." In September, a federal appeals court dealt a major blow to the Internet Archive — one of the largest online repositories of free books, media, and software — in a copyright case with significant implications for publishers, libraries, and readers. The U.S. Court of Appeals for the 2nd Circuit upheld a lower court ruling that found the Internet Archive's huge, digitized lending library of copyrighted books was not covered by the "fair use" doctrine and infringed on the rights of publishers. Agreeing with the Archive's interpretation of fair use "would significantly narrow — if not entirely eviscerate — copyright owners' exclusive right to prepare derivative works," the 2nd Circuit ruled. "Were we to approve [Internet Archive's] use of the works, there would be little reason for consumers or libraries to pay publishers for content they could access for free."
Others disagree, according to some links shared in a recent email from the Internet Archive. Public Knowledge CEO Chris Lewis argues the court's logic renders the fair use doctrine "almost unusuable". And that's just the beginning... This decision harms libraries. It locks them into an e-book ecosystem designed to extract as much money as possible while harvesting (and reselling) reader data en masse. It leaves local communities' reading habits at the mercy of curatorial decisions made by four dominant publishing companies thousands of miles away. It steers Americans away from one of the few remaining bastions of privacy protection and funnels them into a surveillance ecosystem that, like Big Tech, becomes more dangerous with each passing data breach.
But lawyer/librarian Kyle K. Courtney writes that the case "is specific only to the parties, and does not impact the other existing versions of controlled digital lending." Additionally, this decision is limited to the 2nd Circuit and is not binding anywhere else — in other words, it does not apply to the 47 states outside the 2nd Circuit's jurisdiction. In talking with colleagues in the U.S. this week and last, many are continuing their programs because they believe their digital loaning programs fall outside the scope of this ruling... Moreover, the court's opinion focuses on digital books that the court said "are commercially available for sale or license in any electronic text format." Therefore, there remains a significant number of materials in library collections that have not made the jump to digital, nor are likely to, meaning that there is no ebook market to harm — nor is one likely to emerge for certain works, such as those that are no longer commercially viable...

This case represents just one instance in an ongoing conversation about library lending in the digital age, and the possibility of appeal to the U.S. Supreme Court means the final outcome is far from settled.

Some more quotes from links shared by Internet Archive:
  • "It was clear that the only reason all the big publishers sued the Internet Archive was to put another nail in the coffin of libraries and push to keep this ebook licensing scheme grift going. Now the courts have helped." — TechDirt
  • "The case against the Internet Archive is not just a story about the ruination of an online library, but a grander narrative of our times: how money facilitates the transference of knowledge away from the public, back towards the few." — blogger Hannah Williams

Thanks to Slashdot reader fjo3 for sharing the news.


Movies

'Mass Effect' TV Series Is In the Works At Amazon (variety.com) 57

An anonymous reader quotes a report from Variety: A "Mass Effect" TV series is officially in development at Amazon MGM Studios, Variety has learned exclusively. Daniel Casey is set to write and executive produce the adaptation. Karim Zreik will executive produce under his Cedar Tree Productions banner, with Ari Arad and EA's Michael Gamble also executive producing. Cedar Tree is currently under an overall deal at Amazon MGM Studios. Exact plot details are being kept under wraps. [...]

The first "Mass Effect" game launched to rave reviews in 2007. Since then, there have been three more games in the main series, with "Mass Effect: Andromeda" debuting in 2017. There have also been multiple mobile games in the franchise, as well as an animated film, novels, comic books, and other media. The story of the first three "Mass Effect" games revolves around Commander Shepard, a human soldier in the 22nd century trying to save humanity from a race of aliens known as the Reapers. "Andromeda" moved the games much further into the future with a new protagonist, with a fifth game also in the works. The franchise is developed by BioWare and are now published by EA.
In 2010, EA announced plans to turn Mass Effect into a movie, but the project was later canceled. However, Ari Arad (known for co-founding Marvel Studios) led the initial effort and is now working to bring the film to life in this latest attempt.
Networking

DTrace for Linux Comes to Gentoo (gentoo.org) 14

It was originally created back in 2005 by Sun Microsystems for its proprietary Solaris Unix systems, "for troubleshooting kernel and application problems on production systems in real time," explains Wikipedia. "DTrace can be used to get a global overview of a running system, such as the amount of memory, CPU time, filesystem and network resources used by the active processes," explains its Wikipedia entry.

But this week, Gentoo announced: The real, mythical DTrace comes to Gentoo! Need to dynamically trace your kernel or userspace programs, with rainbows, ponies, and unicorns — and all entirely safely and in production?! Gentoo is now ready for that!

Just emerge dev-debug/dtrace and you're all set. All required kernel options are already enabled in the newest stable Gentoo distribution kernel...

Documentation? Sure, there's lots of it. You can start with our DTrace wiki page, the DTrace for Linux page on GitHub, or the original documentation for Illumos. Enjoy!

Thanks to Heraklit (Slashdot reader #29,346) for sharing the news.
AI

Goodreads' Founder Debuts AI-Powered App For Online Readers (techcrunch.com) 5

An anonymous reader quotes a report from TechCrunch: Smashing, a new app curating the best of the web from Goodreads co-founder Otis Chandler, is now available to the public. Like Goodreads, the app aims to create a community around content. But this time, instead of books, the focus is on web content -- like news articles, blog posts, social media posts, podcasts, and more. In addition, Smashing is introducing an AI Questions feature that allows you to engage with the content being shared in different ways, including by viewing a news story from different perspectives or asking the AI to poke holes in the story, among other things. By viewing different angles of a story, you can see how both the political left and right view the subject. Or, in the case of a company's stock, you might be presented with both the bull and bear case.

There are a good handful of AI prompts available at launch, notes Chandler, and not all will make sense to use on every news story or piece of content. For instance, there's a silly "make it funny" prompt, and others that can simplify the story, display a timeline, or introduce "unconventional" takes that may involve thinking outside the box, helping you weigh ideas you hadn't considered yet. You can also ask your own questions, if you prefer. On the app, users are able to create multiple interest feeds to stay informed about the topics that matter to them, like politics, investing, parenting, health and wellness, and more, or even narrower interests like specific companies, sports teams, crypto, climate change, or other subtopics. The app also leverages AI to surface content from around the web and then match it to an individual reader based on what articles they tend to read, what subtopics they like, and what's already popular in the community, as determined by upvotes and downvotes. Combined, the signals tune Smashing to a user's particular interests. As part of the AI Questions feature, Smashing is also introducing AI-powered Story Overview pages, which offer grouped articles, blog posts, and social media posts all about the same story.

IT

Comic Sans Got the Last Laugh 57

On July 4, 2012, CERN physicist Fabiola Gianotti announced a major quantum field theory discovery using a PowerPoint presentation in Comic Sans, sparking both mockery and debate. The font, created by Vincent Connare for Microsoft Bob in 1994, featured deliberately imperfect letters inspired by comic books. Comic Sans shipped with Windows 95 and exploded in popularity as personal computing democratized typography. A backlash emerged as the font appeared on everything from funeral notices to museum signs, culminating in Dave and Holly Combs's "Ban Comic Sans" campaign.
Businesses

Basecamp-Maker 37Signals Says Its 'Cloud Exit' Will Save It $10 Million Over 5 Years (arstechnica.com) 83

An anonymous reader quotes a report from Ars Technica: 37Signals is not a company that makes its policy or management decisions quietly. The productivity software company was an avowedly Mac-centric shop until Apple's move to kill home screen web apps (or Progressive Web Apps, or PWAs) led the firm and its very-public-facing co-founder, David Heinemeier Hansson, to declare a "Return to Windows," followed by a stew of Windows/Mac/Linux. The company waged a public battle with Apple over its App Store subscription policies, and the resulting outcry helped nudge Apple a bit. 37Signals has maintained an active blog for years, its co-founders and employees have written numerous business advice books, and its blog and social media posts regularly hit the front pages of Hacker News.

So when 37Signals decided to pull its seven cloud-based apps off Amazon Web Services in the fall of 2022, it didn't do so quietly or without details. Back then, Hansson described his firm as paying "an at times almost absurd premium" for defense against "wild swings or towering peaks in usage." In early 2023, Hansson wrote that 37Signals expected to save $7 million over five years by buying more than $600,000 worth of Dell server gear and hosting its own apps.

Late last week, Hansson had an update: it's more like $10 million (and, he told the BBC, more like $800,000 in gear). By squeezing more hardware into existing racks and power allowances, estimating seven years' life for that hardware, and eventually transferring its 10 petabytes of S3 storage into a dual-DC Pure Storage flash array, 37Signals expects to save money, run faster, and have more storage available. "The motto of the 2010s and early 2020s -- all-cloud, everything, all the time -- seems to finally have peaked," Hansson writes. "And thank heavens for that!" He adds the caveat that companies with "enormous fluctuations in load," and those in early or uncertain stages, still have a place in the cloud.

Sci-Fi

Neal Stephenson Publishes First Book in New Atomic Age Spy Series 'Bomb Light' (msn.com) 56

Neal Stephenson is a sci-fi writer "of exuberant prose who revels in embracing big ideas," according to the New York Times. "With Polostan he enters the realm of the spy novel..."

Or, as the Washington Post puts it, Stephenson "drops readers into a bloody, inspiring, conflict-ridden and pivotal period of the early 20th century." With its flair for characterization, precision of language, witty apercus and fecundity of events, the novel delivers what we've come to cherish from the author of such fantastical classics as "The Diamond Age," "Snow Crash" and "Cryptonomicon."

But the book is also utterly unlike the majority of Stephenson's work. For one thing, it's short — a far cry from the maximalist "systems novels" that cram in entire worlds with complex interacting power structures, both explicit and hidden. "Polostan" is also devoid of fantastical elements and farcical "hysterical realism," which comes as a bit of a shock given that this is the writer who invented Mafia pizza-delivery guys and cybernetic children's primers. The structure of the book is, likewise, unusually straightforward: a mainly linear narrative dispersed along two timelines...

These observations aren't quibbles so much as alerts to the reader that this is new territory for Stephenson — and good for him! Though, because Polostan is the first novel in a planned historical series titled Bomb Light, which aims to capture the excitement and intrigue of the nuclear arms race, we cannot rule out any Stephenson freakiness down the line... Assuming the subsequent books are as good as this one, Stephenson might end up with a series that rivals Michael Moorcock's Pyat Quartet and Edward Whittemore's Jerusalem Quartet as a vivid and canny dissection of a century unlike any other.

"Much of the next volume is already written," Stephenson says on Substack, calling it "a project that has been in the works for over ten years". (He also notes that among his novels, "even the stuff that's branded as science fiction tends to contain a lot of history.")

Meanwhile in August, Stephenson's blockchain-tech startup Lamina1 announced a collaboration with special effects company Weta Workshop (from "The Lord of the Rings" film franchise) on a "participatory worldbuilding" experience. Variety reports: The experience is expected to offer "a new blueprint for IP expansion through immersive experiences that incorporate fan action and input."

Per Lamina1's description for the project, "Stephenson and the Weta team will begin engaging a global community of creators and fans on the Lamina1 platform this fall, inviting them to unravel the lore behind a mysterious set of 'Artefacts' that will build upon the themes and lore from Stephenson's critically-acclaimed catalog of work.

Next, the superfan will take on the new role of creator, utilizing their discoveries to contribute directly to the expansion of the universe."

"Artefact" will serve as the flagship project in the Lamina1-Weta partnership and first major multimedia property launching on Lamina1's blockchain infrastructure and tooling.

Neal Stephenson answered questions from Slashdot's readers in 2004. Now to promote his new novel Polostan, Stephenson will be making several personal appearances this week:
  • At the Wisconsin Book Festival in Madison (Sunday at noon)
  • Chicago's Book Stall (Monday at 7 p.m.)
  • A Cary, North Carolina Barnes & Noble (Tuesday at 6 p.m.)
  • New York City's Strand (Wednesday at 7 p.m.)
  • At the Midtown Scholar Bookstore in Harrisburg, Pennsylvania (Thursday at 7 p.m.)
  • Ames, Iowa at Dog Eared Books (Sunday at 6 p.m.)

Security

Internet Archive Services Resume as They Promise Stronger, More Secure Return (msn.com) 16

"The Wayback Machine, Archive-It, scanning, and national library crawls have resumed," announced the Internet Archive Thursday, "as well as email, blog, helpdesk, and social media communications. Our team is working around the clock across time zones to bring other services back online."

Founder Brewster Kahle told The Washington Post it's the first time in its almost 30-year history that it's been down more than a few hours. But their article says the Archive is "fighting back." Kahle and his team see the mission of the Internet Archive as a noble one — to build a "library of everything" and ensure records are kept in an online environment where websites change and disappear by the day. "We're all dreamers," said Chris Freeland, the Internet Archive's director of library services. "We believe in the mission of the Internet Archive, and we believe in the promise of the internet." But the site has, at times, courted controversy. The Internet Archive faces lawsuits from book publishers and music labels brought in 2020 and 2023 for digitizing copyrighted books and music, which the organization has argued should be permissible for noncommercial, archival purposes. Kahle said the hundreds of millions of dollars in penalties from the lawsuits could sink the Internet Archive.

Those lawsuits are ongoing. Now, the Internet Archive has also had to turn its attention to fending off cyberattacks. In May, the Internet Archive was hit with a distributed denial-of-service (DDoS) attack, a fairly common type of internet warfare that involves flooding a target site with fake traffic. The archive experienced intermittent outages as a result. Kahle said it was the first time the site had been targeted in its history... [After another attack October 9th], Kahle and his team have spent the week since racing to identify and fix the vulnerabilities that left the Internet Archive open to attack. The organization has "industry standard" security systems, Kahle said, but he added that, until this year, the group had largely stayed out of the crosshairs of cybercriminals. Kahle said he'd opted not to prioritize additional investments in cybersecurity out of the Internet Archive's limited budget of around $20 million to $30 million a year...

[N]o one has reliably claimed the defacement and data breach that forced the Internet Archive to sequester itself, said [cybersecurity researcher] Scott Helmef. He added that the hackers' decision to alert the Internet Archive of their intrusion and send the stolen data to Have I Been Pwned, the monitoring service, could imply they didn't have further intentions with it.... Helme said the episode demonstrates the vulnerability of nonprofit services like the Internet Archive — and of the larger ecosystem of information online that depends on them. "Perhaps they'll find some more funding now that all of these headlines have happened," Helme said. "And people suddenly realize how bad it would be if they were gone."

"Our priority is ensuring the Internet Archive comes online stronger and more secure," the archive said in Thursday's statement. And they noted other recent-past instances of other libraries also being attacked online: As a library community, we are seeing other cyber attacks — for instance the British Library, Seattle Public Library, Toronto Public Library, and now Calgary Public Library. We hope these attacks are not indicative of a trend."

For the latest updates, please check this blog and our official social media accounts: X/Twitter, Bluesky and Mastodon.

Thank you for your patience and ongoing support.

AI

Penguin Random House Underscores Copyright Protection in AI Rebuff (thebookseller.com) 40

The world's biggest trade publisher has changed the wording on its copyright pages to help protect authors' intellectual property from being used to train large language models and other artificial intelligence tools, The Bookseller has reported. From the report: Penguin Random House has amended its copyright wording across all imprints globally, confirming it will appear "in imprint pages across our markets." The new wording states: "No part of this book may be used or reproduced in any manner for the purpose of training artificial intelligence technologies or systems," and will be included in all new titles and any backlist titles that are reprinted.

The statement also "expressly reserves [the titles] from the text and data mining exception," in accordance with a European Parliament directive. The move specifically to ban the use of its titles by AI firms for the development of chatbots and other digital tools comes amid a slew of copyright infringement cases in the US and reports that large tranches of pirated books have already been used by tech companies to train AI tools. In 2024, several academic publishers including Taylor & Francis, Wiley and Sage have announced partnerships to license content to AI firms.

Wikipedia

The Editors Protecting Wikipedia from AI Hoaxes (404media.co) 59

A group of Wikipedia editors have formed WikiProject AI Cleanup, "a collaboration to combat the increasing problem of unsourced, poorly-written AI-generated content on Wikipedia." From a report: The group's goal is to protect one of the world's largest repositories of information from the same kind of misleading AI-generated information that has plagued Google search results, books sold on Amazon, and academic journals. "A few of us had noticed the prevalence of unnatural writing that showed clear signs of being AI-generated, and we managed to replicate similar 'styles' using ChatGPT," Ilyas Lebleu, a founding member of WikiProject AI Cleanup, told me in an email. "Discovering some common AI catchphrases allowed us to quickly spot some of the most egregious examples of generated articles, which we quickly wanted to formalize into an organized project to compile our findings and techniques."

In many cases, WikiProject AI Cleanup finds AI-generated content on Wikipedia with the same methods others have used to find AI-generated content in scientific journals and Google Books, namely by searching for phrases commonly used by ChatGPT. One egregious example is this Wikipedia article about the Chester Mental Health Center, which in November of 2023 included the phrase "As of my last knowledge update in January 2022," referring to the last time the large language model was updated.

Slashdot Top Deals