AI

Why Can't ChatGPT Tell Time? (theverge.com) 113

ChatGPT can browse the web, write code and analyze images, but ask it what time it is and you might get the correct answer, a confident wrong answer, or a polite refusal -- sometimes all three within minutes of each other.

The problem stems from how large language models work. These systems predict answers based on training data and don't receive constant real-time updates about things like time unless they specifically search the internet. AI robotics expert Yervant Kulbashian told The Verge that a language model "is only referencing things that have entered this space," comparing it to a castaway on an island stocked with books but no watch.

OpenAI can give ChatGPT access to system clocks, and does so through features like Search. But there are tradeoffs: every clock check consumes space in the model's context window, the finite portion of information it can hold at any given moment. Pasquale Minervini, a natural language processing researcher at the University of Edinburgh, said the leading models also struggle to read analog clock faces and have trouble with calendars.
Sci-Fi

Mind-Altering 'Brain Weapons' No Longer Only Science Fiction, Say Researchers (theguardian.com) 34

Researchers warn that rapid advances in neuroscience, pharmacology, and AI are bringing "brain weapons" out of science fiction and into real-world plausibility. They argue current arms treaties don't adequately cover these emerging tools and call for a new, proactive framework to prevent the weaponization of the human mind. The Guardian reports: Michael Crowley and Malcolm Dando, of Bradford University, are about to publish a book that they believe should be a wake-up call to the world. [...] The book, published by the Royal Society of Chemistry, explores how advances in neuroscience, pharmacology and artificial intelligence are coming together to create a new threat. "We are entering an era where the brain itself could become a battlefield," said Crowley. "The tools to manipulate the central nervous system -- to sedate, confuse or even coerce -- are becoming more precise, more accessible and more attractive to states."

The book traces the fascinating, if appalling, history of state-sponsored research into central nervous system (CNS)-acting chemicals. [...] The academics argue that the ability exists to create much more "sophisticated and targeted" weapons that would once have been unimaginable. Dando said: "The same knowledge that helps us treat neurological disorders could be used to disrupt cognition, induce compliance, or even in the future turn people into unwitting agents." The threat is "real and growing" but there are gaps in international arms control treaties preventing it from being tackled effectively, they say. [...]

The book makes the case for a new "holistic arms control" framework, rather than relying on existing arms control treaties. It sets out a number of practical steps that could be taken, including establishing a working group on CNS-acting and broader incapacitating agents. Other proposals concern training, monitoring and definitions. "We need to move from reactive to proactive governance," said Dando. Both men acknowledge that we are learning more about the brain and the central nervous system, which is good for humanity. They said they were not trying to stifle scientific progress and it was about preventing malign intent. Crowley said: "This is a wake-up call. We must act now to protect the integrity of science and the sanctity of the human mind."

Businesses

Who is OpenAI's Auditor? (ft.com) 7

OpenAI won't say who audits its books. The company, which projects to hit an ARR of $20 billion this year and is valued at $500 billion, has committed to spending about $1.4 trillion on data centers over the next decade. It accounts for roughly two-thirds of unfulfilled contracts at Oracle and two-fifths at CoreWeave. Microsoft alone holds around $375 billion in unfulfilled contracts with OpenAI.

Reuters reported the company may target a $1 trillion valuation for a potential IPO in coming years. Most companies at this scale use one of the Big Four accounting firms: Deloitte, EY, KPMG or PwC. OpenAI declined to comment to Financial Times. A person close to the organization told the publication the company has "an industry standard audit with one of the Big Four firms." The company's latest Form 990 filing lists Fontanello, Duffield, & Otake -- a small San Francisco accountancy firm -- as the paid preparer. The form does say an independent accountant audited the statements.

Michael Burry, last night: "Can anyone name [OpenAI's] auditor?"
The Internet

The Internet Archive Now Captures AI-Generated Content (Including Google's AI Overviews) (cnn.com) 4

CNN profiled the non-profit Internet Archive today — and included this tidbit about how they archive parts of the internet that are now "tucked in conversations with AI chatbots." The rise of artificial intelligence and AI chatbots means the Internet Archive is changing how it records the history of the internet. In addition to web pages, the Internet Archive now captures AI-generated content, like ChatGPT answers and those summaries that appear at the top of Google search results. The Internet Archive team, which is made up of librarians and software engineers, are experimenting with ways to preserve how people get their news from chatbots by coming up with hundreds of questions and prompts each day based on the news, and recording both the queries and outputs, [says Wayback Machine Director Mark Graham].
It sounds like a fun place to work... Archivists use bespoke machines to digitize books page by page, livestreaming their work on YouTube for all to see (alongside some lo-fi music). Record players churn out vintage tunes from 1920s and 1940s, and the building houses every type of media console for any type of content imaginable, from microfilm, to CDs and satellite television. (The Internet Archive preserves music, television, books and video games, too)... "There are a lot of people that are just passionate about the cause. There's a cyberpunk atmosphere," Annie Rauwerda, a Wikipedia editor and social media influencer, said at a party thrown at the Internet Archive's headquarters to celebrate reaching a trillion pages "The internet (feels) quite corporate when I use it a lot these days, but you wouldn't know from the people here."
Music

Nonprofit Releases Thousands of Rare American Music Recordings Online (ucsb.edu) 17

The nonprofit Dust-to-Digital Foundation is making thousands of historic songs accessible to the public for free through a new partnership with the University of California, Santa Barbara. The songs represent "some of the rarest and most uniquely American music borne from the Jazz Age and the Great Depression," according to the university, and classic blues recordings or tracks by Fiddlin' John Carson and his daughter Moonshine Kate "would have likely been lost to landfills and faded from memory."

Launched in 1999 by Lance and April Ledbetter, Dust-to-Digital focused on preserving hard-to-find music. Originally a commercial label producing high-quality box sets (along with CDs, records, and books), it established a nonprofit foundation in 2010, working closely with collectors to digitize and preserve record collections. And there's an interesting story about how they became familiar with library curator David Seubert... Once a relationship is established, Dust-to-Digital sets up special turntables and laptops in a collector's home, with paid technicians painstakingly digitizing and labeling each record, one song at a time. Depending on the size of the collection, the process can take months, even years... In 2006, they heard about Seubert's Cylinder Preservation and Digitization Project getting "slashdotted," a term that describes when a website crashes or receives a sudden and debilitating spike in traffic after being mentioned in an article on Slashdot.
Here in 2025, the university's library already has over 50,000 songs in a Special Research Collections, which they've been uploading it to a Discography of American Historical Recordings (DAHR) database. ("Recordings in the public domain are also available for free download, in keeping with the UCSB Library's mission for open access.") Over 5,000 more songs from Dust-to-Digital have already been added, says library curator Seubert, and "Thousands more are in the pipeline."

One interest detail? The bulk of the new songs come from Joe Bussard, a man whose 75-year obsession with record collecting earned him the name "the king of the record collectors and "the saint of 78s".
AI

Common Crawl Criticized for 'Quietly Funneling Paywalled Articles to AI Developers' (msn.com) 42

For more than a decade, the nonprofit Common Crawl "has been scraping billions of webpages to build a massive archive of the internet," notes the Atlantic, making it freely available for research. "In recent years, however, this archive has been put to a controversial purpose: AI companies including OpenAI, Google, Anthropic, Nvidia, Meta, and Amazon have used it to train large language models.

"In the process, my reporting has found, Common Crawl has opened a back door for AI companies to train their models with paywalled articles from major news websites. And the foundation appears to be lying to publishers about this — as well as masking the actual contents of its archives..." Common Crawl's website states that it scrapes the internet for "freely available content" without "going behind any 'paywalls.'" Yet the organization has taken articles from major news websites that people normally have to pay for — allowing AI companies to train their LLMs on high-quality journalism for free. Meanwhile, Common Crawl's executive director, Rich Skrenta, has publicly made the case that AI models should be able to access anything on the internet. "The robots are people too," he told me, and should therefore be allowed to "read the books" for free. Multiple news publishers have requested that Common Crawl remove their articles to prevent exactly this use. Common Crawl says it complies with these requests. But my research shows that it does not.

I've discovered that pages downloaded by Common Crawl have appeared in the training data of thousands of AI models. As Stefan Baack, a researcher formerly at Mozilla, has written, "Generative AI in its current form would probably not be possible without Common Crawl." In 2020, OpenAI used Common Crawl's archives to train GPT-3. OpenAI claimed that the program could generate "news articles which human evaluators have difficulty distinguishing from articles written by humans," and in 2022, an iteration on that model, GPT-3.5, became the basis for ChatGPT, kicking off the ongoing generative-AI boom. Many different AI companies are now using publishers' articles to train models that summarize and paraphrase the news, and are deploying those models in ways that steal readers from writers and publishers.

Common Crawl maintains that it is doing nothing wrong. I spoke with Skrenta twice while reporting this story. During the second conversation, I asked him about the foundation archiving news articles even after publishers have asked it to stop. Skrenta told me that these publishers are making a mistake by excluding themselves from "Search 2.0" — referring to the generative-AI products now widely being used to find information online — and said that, anyway, it is the publishers that made their work available in the first place. "You shouldn't have put your content on the internet if you didn't want it to be on the internet," he said. Common Crawl doesn't log in to the websites it scrapes, but its scraper is immune to some of the paywall mechanisms used by news publishers. For example, on many news websites, you can briefly see the full text of any article before your web browser executes the paywall code that checks whether you're a subscriber and hides the content if you're not. Common Crawl's scraper never executes that code, so it gets the full articles.

Thus, by my estimate, the foundation's archives contain millions of articles from news organizations around the world, including The Economist, the Los Angeles Times, The Wall Street Journal, The New York Times, The New Yorker, Harper's, and The Atlantic.... A search for nytimes.com in any crawl from 2013 through 2022 shows a "no captures" result, when in fact there are articles from NYTimes.com in most of these crawls.

"In the past year, Common Crawl's CCBot has become the scraper most widely blocked by the top 1,000 websites," the article points out...
AI

Chan Zuckerberg Initiative Shifts Bulk of Philanthropy, 'Going All In on AI-Powered Biology' (apnews.com) 32

The Associated Press reports that "For the past decade, Dr. Priscilla Chan and her husband Mark Zuckerberg have focused part of their philanthropy on a lofty goal — 'to cure, prevent or manage all disease' — if not in their lifetime, then in their children's."

During that decade they also funded other initiatives (including underprivileged schools and immigration reform), according to the article. But there's a change coming: Now, the billionaire couple is shifting the bulk of their philanthropic resources to Biohub, the pair's science organization, and focusing on using artificial intelligence to accelerate scientific discovery. The idea is to develop virtual, AI-based cell models to understand how they work in the human body, study inflammation and use AI to "harness the immune system" for disease detection, prevention and treatment. "I feel like the science work that we've done, the Biohub model in particular, has been the most impactful thing that we have done. So we want to really double down on that. Biohub is going to be the main focus of our philanthropy going forward," Zuckerberg said Wednesday evening at an event at the Biohub Imaging Institute in Redwood City, California.... Chan and Zuckerberg have pledged 99% of their lifetime wealth — from shares of Meta Platforms, where Zuckerberg is CEO — toward these efforts...

On Thursday, Chan and Zuckerberg also announced that Biohub has hired the team at EvolutionaryScale, an AI research lab that has created large-scale AI systems for the life sciences... Biohub's ambition for the next years and decades is to create virtual cell systems that would not have been possible without recent advances in AI. Similar to how large language models learn from vast databases of digital books, online writings and other media, its researchers and scientists are working toward building virtual systems that serve as digital representations of human physiology on all levels, such as molecular, cellular or genome. As it is open source — free and publicly available — scientists can then conduct virtual experiments on a scale not possible in physical laboratories.

"We will continue the model we've pioneered of bringing together scientists and engineers in our own state-of-the-art labs to build tools that advance the field," according to Thursday's blog post. "We'll then use those tools to generate new data sets for training new biological AI models to create virtual cells and immune systems and engineer our cells to detect and treat disease....

"We have also established the first large-scale GPU cluster for biological research, as well as the largest datasets around human cell types. This collection of resources does not exist anywhere else."
Books

Amazon is Testing an AI Tool That Automatically Translates Books Into Other Languages (engadget.com) 30

An anonymous reader shares a report: Amazon just introduced an AI tool that will automatically translate books into other languages. The appropriately-named Kindle Translate is being advertised as a resource for authors that self publish on the platform.

The company says the tool can translate entire books between English and Spanish and German to English. Amazon promises that more languages are coming down the pike. It's available right now in a beta form to select authors enrolled in the Kindle Direct Publishing platform. There's a broader rollout planned for a later date.

Piracy

Google Removed 749 Million Anna's Archive URLs From Its Search Results (torrentfreak.com) 38

Google has delisted over 749 million URLs from Anna's Archive, a shadow library and meta-search engine for pirated books, representing 5% of all copyright takedown requests ever filed with the company. TorrentFreak reports: Google's transparency report reveals that rightsholders asked Google to remove 784 million URLs, divided over the three main Anna's Archive domains. A small number were rejected, mainly because Google didn't index the reported links, resulting in 749 million confirmed removals. The comparison to sites such as The Pirate Bay isn't fair, as Anna's Archive has many more pages in its archive and uses multiple country-specific subdomains. This means that there's simply more content to take down. That said, in terms of takedown activity, the site's three domain names clearly dwarf all pirate competition.

Since Google published its first transparency report in May 2012, rightsholders have flagged 15.1 billion allegedly infringing URLs. That's a staggering number, but the fact that 5% of the total targeted Anna's Archive URLs is remarkable. Penguin Random House and John Wiley & Sons are the most active publishers targeting the site, but they are certainly not alone. According to Google data, more than 1,000 authors or publishers have sent DMCA notices targeting Anna's Archive domains. Yet, there appears to be no end in sight. Rightsholders are reporting roughly 10 million new URLs per week for the popular piracy library, so there is no shortage of content to report.

The Internet

Internet Archive's Legal Fights Are Over, But Its Founder Mourns What Was Lost (arstechnica.com) 39

The Internet Archive celebrated archiving its trillionth webpage last month and received congratulations from San Francisco, which declared October 22 "Internet Archive Day." Senator Alex Padilla designated the nonprofit a federal depository library. The organization currently faces no major lawsuits and no active threats to its collections. But these victories arrived after years of bruising copyright battles that forced the removal of more than 500,000 books from the Archive's Open Library. "We survived, but it wiped out the Library," founder Brewster Kahle told ArsTechnica.

In 2024, the Archive lost its final appeal in a lawsuit brought by book publishers over its e-book lending model. Damages could have topped $400 million before publishers announced a confidential settlement. Last month, the organization settled another suit over its Great 78 Project after music publishers sought damages of up to $700 million. That settlement was also confidential. In both cases, the Archive's experts challenged publishers' estimates as massively inflated.

Kahle had envisioned the Open Library as a way for Wikipedia to link to book scans and help researchers reference e-books. The Archive wanted to deepen Wikipedia's authority as a research tool by surfacing information often buried in books. "That's what they really succeeded at -- to make sure that Wikipedia readers don't get access to books," Kahle said of the publishers. He thinks "the world became stupider" when the Open Library was gutted. The Archive is now expanding Democracy's Library, a free online compendium of government research and publications that will be linked in Wikipedia articles.
Books

George Orwell Classics Get New Lease of Life In Welsh (bbc.com) 28

For the first time, George Orwell's Animal Farm and 1984 have been translated into Welsh, with localized titles, character names, and even a Welsh version of Newspeak. The BBC reports: Animal Farm, a 1945 political allegory inspired by the Russian Revolution, is set in north-west Wales in the Welsh edition, Foel yr Anifeiliaid, with Orwell's classic characters given Welsh names to add authenticity. Mil Naw Wyth Deg Pedwar, or 1984, Orwell's vision of a bleak totalitarian future, published in 1949, contains a Welsh version of Newspeak, the novel's fictional language. Both books remain "seminal works with timeless relevance," said Welsh book publisher Melin Bapur, and feel "particularly relevant now in an age of 'alternative facts', AI, and misinformation."
AI

Detection Firm Finds 82% of Herbal Remedy Books on Amazon 'Likely Written' By AI (theguardian.com) 42

An anonymous reader shares a report: With gingko "memory-boost tinctures," fennel "tummy-soothing syrups" and "citrus-immune gummies," AI "slop" has come for herbalism, a study published by a leading AI-detection company has found. Originality.ai, which offers its tools to universities and businesses, says it scanned 558 titles published in Amazon's herbal remedies subcategory between January and September this year, and found 82% of the books "were likely written" by AI.

"This is a damning revelation of the sheer scope of unlabelled, unverified, unchecked, likely AI content that has completely invaded [Amazon's] platform," wrote Michael Fraiman, author of the study. "There's a huge amount of herbal research out there right now that's absolutely rubbish," said Sue Sprung, a medical herbalist in Liverpool. "AI won't know how to sift through all the dross, all the rubbish, that's of absolutely no consequence. It would lead people astray."

EU

Apple Attacks EU Crackdown in Digital Law's Biggest Court Test (irishexaminer.com) 23

Apple lashed out at the European Union's attempts to tame the power of Silicon Valley in the most far-reaching legal challenge of the bloc's Big Tech antitrust rules. From a report: The iPhone maker's lawyer Daniel Beard told the General Court in Luxembourg on Tuesday that the Digital Markets Act "imposes hugely onerous and intrusive burdens" at odds with Apple's rights in the EU marketplace.

The DMA came onto the EU's books in 2023 and is designed to clip the wings of the world's largest technology platforms with a slew of dos and don'ts. But over recent months, the law has also drawn the ire of US President Donald Trump and plagued EU-US trade talks. Apple -- seen as the biggest renegade against the EU's crackdown -- challenged the law on three fronts: EU obligations to make rival hardware work with its iPhone, the regulator's decision to drag the hugely profitable App Store under the rules, and a decision to probe whether iMessage should have faced the rules, which it later escaped.

AI

Salesforce Sued By Authors Over AI Software (reuters.com) 4

An anonymous reader shares a report: Cloud-computing firm Salesforce was hit with a proposed class action lawsuit by two authors who alleged the company used thousands of books without permission to train its AI software. Novelists Molly Tanzer and Jennifer Gilmore said in the complaint that Salesforce infringed copyrights by using their work to train its xGen AI models to process language.
The Internet

Internet Archive Ordered To Block Books in Belgium After Talks With Publishers Fail (torrentfreak.com) 7

The Internet Archive must block access to books in its Open Library project for Belgian users after negotiations with publishers failed. A Brussels Business Court issued a site-blocking order in July targeting several shadow libraries and the Internet Archive. A Belgian government department paused the order for the U.S. nonprofit and urged both parties to negotiate. The talks over recent weeks were unsuccessful.

The Department for Combating Infringements of Copyright concluded last week that the Internet Archive hosts the contested books and has the ability to render them inaccessible. Publishers must supply a list of books to be blocked. The nonprofit then has 20 calendar days to implement the measures and prevent future digital lending of those works in Belgium. The order includes a one-time penalty of $578,000 for non-compliance and remains in place until July 16 next year. The Internet Archive operates Open Library by purchasing physical copies and digitizing them to lend out one at a time. Publishers previously won a U.S. federal court case against the project.

Slashdot Top Deals