Science

Citizen Scientists Just Helped Discover Nearly 8,000 New Eclipsing Binary Stars (spokesman.com) 1

"Citizen scientists have successfully located thousands of previously unknown pairs of 'eclipsing binary' stars," reports the Washington Post, citing a recent announcement from NASA. The ongoing initiative helps space researchers hunt for "eclipsing binary" stars, a rare phenomenon in which two stars orbit one another, periodically blocking each other's light. These star pairs offer important data to astrophysicists, who consider the many measurable properties of eclipsing binaries — and the information they bear about the history of star formation and destruction — as a foundation of the field...

The citizen science project in question, the Eclipsing Binary Patrol, validates images from NASA's Transiting Exoplanet Survey Satellite (TESS) mission. The satellite, launched in 2018, is "exceptionally capable at detecting varying stars," the researchers write in a preprint paper describing the initiative. The researchers used machine learning to identify about 1.2 million potential eclipsing star pairs. Citizen scientists then validated a subset of about 60,000... manually inspecting hundreds of thousands of images of eclipse-like events and weeding out actual binaries from images that tricked the algorithm. "Thankfully," the researchers write, "to the rescue come volunteers from all walks of life that boost the capacity of bandwidth-limited professional astronomers many-fold and help tackle the ever-increasing volume of publicly available astronomical data."

Universe Today describes how they limited the dataset to only stars with a magnitude brighter than 15, then used a Python tool to generate a massive dataset of millions of light curves... The outcome of all the work resulted in the identification of 10,001 eclipsing binary systems. 7,936 of them are new to science, while the other 2,065 were previously known, but the study provided updated, more accurate, parameters for their periods, as TESS' dataset provided better insight. There were also some particularly interesting systems that could hold new discoveries, including several that had variable eclipse timings, and plenty that might have a third star, and some that show a significant dynamic between the star being orbited and the one doing the orbiting.

All of those systems await further research, but there's another, unspoken factor at play in this data — exoplanets. TESS was originally designed as an exoplanet hunter, and this kind of large scale AI/human collaboration of lightcurve analysis is exactly the kind of work that could potentially produce even more accurate exoplanet catalogues, as evidenced by some of the work already done in this paper. That seems to be the next step for this dataset, with Dr. Kostov telling an interviewer "I can't wait to search them for exoplanets!" Given the data has already been collected, and the team has already been assembled, it's very likely he'll get his chance soon.

Python

Behind the Scenes at the Python Software Foundation (python.org) 11

The Python Software Foundation ("made up of, governed, and led by the community") does more than just host Python and its documnation, the Python Package Repository, and the development workflows of core CPython developers. This week the PSF released its 28-page Annual Impact Report this week, noting that 2024 was their first year with three CPython developers-in-residence — and "Between Lukasz, Petr, and Serhiy, over 750 pull requests were authored, and another 1,500 pull requests by other authors were reviewed and merged." Lukasz Langa co-implemented the new colorful shell included in Python 3.13, along with Pablo Galindo Salgado, Emily Morehouse-Valcarcel, and Lysandros Nikolaou.... Code-wise, some of the most interesting contributions by Petr Viktorin were around the ctypes module that allows interaction between Python and C.... These are just a few of Serhiy Storchaka's many contributions in 2024: improving error messages for strings, bytes, and bytearrays; reworking support for var-arguments in the C argument handling generator called "Argument Clinic"; fixing memory leaks in regular expressions; raising the limits for Python integers on 64-bit platforms; adding support for arbitrary code page encodings on Windows; improving complex and fraction number support...

Thanks to the investment of [the OpenSSF's security project] Alpha-Omega in 2024, our Security Developer-in-Residence, Seth Larson, continued his work improving the security posture of CPython and the ecosystem of Python packages. Python continues to be an open source security leader, evident by the Linux kernel becoming a CVE Numbering Authority using our guide as well as our publication of a new implementers guide for Trusted Publishers used by Ruby, Crates.io, and Nuget. Python was also recommended as a memory-safe programming language in early 2024 by the White House and CISA following our response to the Office of the National Cyber Directory Request for Information on open source security in 2023... Due to the increasing demand for SBOMs, Seth has taken the initiative to generate SBOM documents for the CPython runtime and all its dependencies, which are now available on python.org/downloads. Seth has also started work on standardizing SBOM documents for Python packages with PEP 770, aiming to solve the "Phantom Dependency" problem and accurately represent non-Python software included in Python packages.

With the continued investment in 2024 by Amazon Web Services Open Source and Georgetown CSET for this critical role, our PyPI Safety & Security Engineer, Mike Fiedler, completed his first full calendar year at the PSF... In March 2024, Mike added a "Report project as malware" button on the website, creating more structure to inbound reports and decreasing remediation time. This new button has been used over 2,000 times! The large spike in June led to prohibiting Outlook email domains, and the spike in November was driven by a persistent attack. Mike developed the ability to place projects in quarantine pending further investigation. Thanks to a grant from Alpha-Omega, Mike will continue his work for a second year. We plan to do more work on minimizing time-on-PyPI for malware in 2025...

In 2024, PyPI saw an 84% growth in download counts and 48% growth in bandwidth, serving 526,072,569,160 downloads for the 610,131 projects hosted there, requiring 1.11 Exabytes of data transfer, or 281.6 Gbps of bandwidth 24x7x365. In 2024, 97k new projects, 1.2 million new releases, and 3.1 million new files were uploaded to the index.

Stats

RedMonk Ranks Top Programming Languages Over Time - and Considers Ditching Its 'Stack Overflow' Metric (redmonk.com) 40

The developer-focused analyst firm RedMonk releases twice-a-year rankings of programming language popularity. This week they also released a handy graph showing the movement of top 20 languages since 2012. Their current rankings for programming language popularity...

1. JavaScript
2. Python
3. Java
4. PHP
5. C#
6. TypeScript
7. CSS
8. C++
9. Ruby
10. C

The chart shows that over the years the rankings really haven't changed much (other than a surge for TypeScript and Python, plus a drop for Ruby). JavaScript has consistently been #1 (except in two early rankings, where it came in behind Java). And in 2020 Java finally slipped from #2 down to #3, falling behind... Python. Python had already overtaken PHP for the #3 spot in 2017, pushing PHP to a steady #4. C# has maintained the #5 spot since 2014 (though with close competition from both C++ and CSS). And since 2021 the next four spots have been held by Ruby, C, Swift, and R.

The only change in the current top 20 since the last ranking "is Dart dropping from a tie with Rust at 19 into sole possession of 20," writes RedMonk co-founder Stephen O'Grady. "In the decade and a half that we have been ranking these languages, this is by far the least movement within the top 20 that we have seen. While this is to some degree attributable to a general stasis that has settled over the rankings in recent years, the extraordinary lack of movement is likely also in part a manifestation of Stack Overflow's decline in query volume..." The arrival of AI has had a significant and accelerating impact on Stack Overflow, which comprises one half of the data used to both plot and rank languages twice a year... Stack Overflow's value from an observational standpoint is not what it once was, and that has a tangible impact, as we'll see....

As that long time developer site sees fewer questions, it becomes less impactful in terms of driving volatility on its half of the rankings axis, and potentially less suggestive of trends moving forward... [W]e're not yet at a point where Stack Overflow's role in our rankings has been deprecated, but the conversations at least are happening behind the scenes.

"The veracity of the Stack Overflow data is increasingly questionable," writes RedMonk's research director: When we use Stack Overflow for programming language rankings we measure how many questions are asked using specific programming language tags... While other pieces, like Matt Asay's AI didn't kill Stack Overflow are right to point out that the decline existed before the advent of AI coding assistants, it is clear that the usage dramatically decreased post 2023 when ChatGPT became widely available. The number of questions asked are now about 10% what they were at Stack Overflow's peak.
"RedMonk is continuing to evaluate the quality of this analysis," the research director concludes, arguing "there is value in long-lived data, and seeing trends move over a decade is interesting and worthwhile. On the other hand, at this point half of the data feeding the programming language rankings is increasingly stale and of questionable value on a going-forward basis, and there is as of now no replacement public data set available.

"We'll continue to watch and advise you all on what we see with Stack Overflow's data."
Python

Python Creator Guido van Rossum Asks: Is 'Worse is Better' Still True for Programming Languages? (blogspot.com) 67

In 1989 a computer scientist argued that more functionality in software actually lowers usability and practicality — leading to the counterintuitive proposition that "worse is better". But is that still true?

Python's original creator Guido van Rossum addressed the question last month in a lightning talk at the annual Python Language Summit 2025. Guido started by recounting earlier periods of Python development from 35 years ago, where he used UNIX "almost exclusively" and thus "Python was greatly influenced by UNIX's 'worse is better' philosophy"... "The fact that [Python] wasn't perfect encouraged many people to start contributing. All of the code was straightforward, there were no thoughts of optimization... These early contributors also now had a stake in the language; [Python] was also their baby"...

Guido contrasted early development to how Python is developed now: "features that take years to produce from teams of software developers paid by big tech companies. The static type system requires an academic-level understanding of esoteric type system features." And this isn't just Python the language, "third-party projects like numpy are maintained by folks who are paid full-time to do so.... Now we have a huge community, but very few people, relatively speaking, are contributing meaningfully."

Guido asked whether the expectation for Python contributors going forward would be that "you had to write a perfect PEP or create a perfect prototype that can be turned into production-ready code?" Guido pined for the "old days" where feature development could skip performance or feature-completion to get something into the hands of the community to "start kicking the tires". "Do we have to abandon 'worse is better' as a philosophy and try to make everything as perfect as possible?" Guido thought doing so "would be a shame", but that he "wasn't sure how to change it", acknowledging that core developers wouldn't want to create features and then break users with future releases.

Guido referenced David Hewitt's PyO3 talk about Rust and Python, and that development "was using worse is better," where there is a core feature set that works, and plenty of work to be done and open questions. "That sounds a lot more fun than working on core CPython", Guido paused, "...not that I'd ever personally learn Rust. Maybe I should give it a try after," which garnered laughter from core developers.

"Maybe we should do more of that: allowing contributors in the community to have a stake and care".

Python

New Code.org Curriculum Aims To Make Schoolkids Python-Literate and AI-Ready 50

Longtime Slashdot reader theodp writes: The old Code.org curriculum page for middle and high school students has been changed to include a new Python Lab in the tech-backed nonprofit's K-12 offerings. Elsewhere on the site, a Computer Science and AI Foundations curriculum is described that includes units on 'Foundations of AI Programming [in Python]' and 'Insights from Data and AI [aka Data Science].' A more-detailed AI Foundations Syllabus 25-26 document promises a second semester of material is coming soon: "This semester offers an innovative approach to teaching programming by integrating learning with and about artificial intelligence (AI). Using Python as the primary language, students build foundational programming skills while leveraging AI tools to enhance computational thinking and problem-solving. The curriculum also introduces students to the basics of creating AI-powered programs, exploring machine learning, and applying data science principles."

Newly-posted videos on Code.org's YouTube channel appear to be intended to support the new Python-based CS & AI course. "Python is extremely versatile," explains a Walmart data scientist to open the video for Data Science: Using Python. "So, first of all, Python is one of the very few languages that can handle numbers very, very well." A researcher at the Univ. of Washington's Institute for Health Metrics and Evaluation (IHME) adds, "Python is the gold standard and what people expect data scientists to know [...] Key to us being able to handle really big data sets is our use of Python and cluster computing." Adding to the Python love, an IHME data analyst explains, "Python is a great choice for large databases because there's a lot of support for Python libraries."

Code.org is currently recruiting teachers to attend its CS and AI Foundations Professional Learning program this summer, which is being taught by Code.org's national network of university and nonprofit regional partners (teachers who signup have a chance to win $250 in DonorsChoose credits for their classrooms). A flyer for a five-day Michigan Professional Development program to prepare teachers for a pilot of the Code.org CS & A course touts the new curriculum as "an alternative to the AP [Computer Science] pathway" (teachers are offered scholarships covering registration, lodging, meals, and workshop materials).

Interestingly, Code.org's embrace of Python and Data Science comes as the nonprofit changes its mission to 'make CS and AI a core part of K-12 education' and launches a new national campaign with tech leaders to make CS and AI a graduation requirement. Prior to AI changing the education conversation, Code.org in 2021 boasted that it had lined up a consortium of tech giants, politicians, and educators to push its new $15 million Amazon-bankrolled Java AP CS A curriculum into K-12 classrooms. Just three years later, however, Amazon CEO Andy Jassy was boasting to investors that Amazon had turned to AI to automatically do Java coding that he claimed would have otherwise taken human coders 4,500 developer-years to complete.
Programming

Python Can Now Call Code Written in Chris Lattner's Mojo (modular.com) 26

Mojo (the programming language) reached a milestone today.

The story so far... Chris Lattner created the Swift programming language (and answered questions from Slashdot readers in 2017 on his way to new jobs at Tesla, Google, and SiFive). But in 2023, he'd created a new programming language called Mojo — a superset of Python with added functionality for high performance code that takes advantage of modern accelerators — as part of his work at AI infrastructure company Modular.AI.

And today Modular's product manager Brad Larson announced Python users can now call Mojo code from Python. (Watch for it in Mojo's latest nightly builds...) The Python interoperability section of the Mojo manual has been expanded and now includes a dedicated document on calling Mojo from Python. We've also added a couple of new examples to the modular GitHub repository: a "hello world" that shows how to round-trip from Python to Mojo and back, and one that shows how even Mojo code that uses the GPU can be called from Python. This is usable through any of the ways of installing MAX [their Modular Accelerated Xecution platform, an integrated suite of AI compute tools] and the Mojo compiler: via pip install modular / pip install max, or with Conda via Magic / Pixi.

One of our goals has been the progressive introduction of MAX and Mojo into the massive Python codebases out in the world today. We feel that enabling selective migration of performance bottlenecks in Python code to fast Mojo (especially Mojo running on accelerators) will unlock entirely new applications. I'm really excited for how this will expand the reach of the Mojo code many of you have been writing...

It has taken months of deep technical work to get to this point, and this is just the first step in the roll-out of this new language feature. I strongly recommend reading the list of current known limitations to understand what may not work just yet, both to avoid potential frustration and to prevent the filing of duplicate issues for known areas that we're working on.

"We are really interested in what you'll build with this new functionality, as well as hearing your feedback about how this could be made even better," the post concludes.

Mojo's licensing makes it free on any device, for any research, hobby or learning project, as well as on x86 or ARM CPUs or NVIDIA GPU.
Programming

Microsoft CEO Says Up To 30% of the Company's Code Was Written by AI (techcrunch.com) 149

Microsoft CEO Satya Nadella said that 20%-30% of code inside the company's repositories was "written by software" -- meaning AI -- during a fireside chat with Meta CEO Mark Zuckerberg at Meta's LlamaCon conference on Tuesday. From a report: Nadella gave the figure after Zuckerberg asked roughly how much of Microsoft's code is AI-generated today. The Microsoft CEO said the company was seeing mixed results in AI-generated code across different languages, with more progress in Python and less in C++.
Windows

Microsoft Brings Native PyTorch Arm Support To Windows Devices (neowin.net) 3

Microsoft has announced native PyTorch support for Windows on Arm devices with the release of PyTorch 2.7, making it significantly easier for developers to build and run machine learning models directly on Arm-powered Windows machines. This eliminates the need for manual compilation and opens up performance gains for AI tasks like image classification, NLP, and generative AI. Neowin reports: With the release of PyTorch 2.7, native Arm builds for Windows on Arm are now readily available for Python 3.12. This means developers can simply install PyTorch using a standard package manager like pip.

According to Microsoft: "This unlocks the potential to leverage the full performance of Arm64 architecture on Windows devices, like Copilot+ PCs, for machine learning experimentation, providing a robust platform for developers and researchers to innovate and refine their models."

Security

AI Hallucinations Lead To a New Cyber Threat: Slopsquatting 51

Researchers have uncovered a new supply chain attack called Slopsquatting, where threat actors exploit hallucinated, non-existent package names generated by AI coding tools like GPT-4 and CodeLlama. These believable yet fake packages, representing almost 20% of the samples tested, can be registered by attackers to distribute malicious code. CSO Online reports: Slopsquatting, as researchers are calling it, is a term first coined by Seth Larson, a security developer-in-residence at Python Software Foundation (PSF), for its resemblance to the typosquatting technique. Instead of relying on a user's mistake, as in typosquats, threat actors rely on an AI model's mistake. A significant number of packages, amounting to 19.7% (205,000 packages), recommended in test samples were found to be fakes. Open-source models -- like DeepSeek and WizardCoder -- hallucinated more frequently, at 21.7% on average, compared to the commercial ones (5.2%) like GPT 4. Researchers found CodeLlama ( hallucinating over a third of the outputs) to be the worst offender, and GPT-4 Turbo ( just 3.59% hallucinations) to be the best performer.

These package hallucinations are particularly dangerous as they were found to be persistent, repetitive, and believable. When researchers reran 500 prompts that had previously produced hallucinated packages, 43% of hallucinations reappeared every time in 10 successive re-runs, with 58% of them appearing in more than one run. The study concluded that this persistence indicates "that the majority of hallucinations are not just random noise, but repeatable artifacts of how the models respond to certain prompts." This increases their value to attackers, it added. Additionally, these hallucinated package names were observed to be "semantically convincing." Thirty-eight percent of them had moderate string similarity to real packages, suggesting a similar naming structure. "Only 13% of hallucinations were simple off-by-one typos," Socket added.
The research can found be in a paper on arXiv.org (PDF).
AI

OpenAI Unveils o3 and o4-mini Models (openai.com) 2

OpenAI has released two new AI models that can "think with images" during their reasoning process. The o3 and o4-mini models represent a significant advancement in visual perception, enabling them to manipulate images -- cropping, zooming, and rotating -- as part of their analytical process.

Unlike previous models, o3 and o4-mini can agentically use all of ChatGPT's tools, including web search, Python code execution, and image generation. This allows them to tackle multi-faceted problems by selecting appropriate tools based on the task at hand.

The models have set new state-of-the-art performance benchmarks across multiple domains. On visual tasks, o3 achieved 86.8% accuracy on MathVista and 78.6% on CharXiv-Reasoning, while o4-mini scored 91.6% on AIME 2024 competitions. In expert evaluations, o3 made 20% fewer major errors than its predecessor on complex real-world tasks. ChatGPT Plus, Pro, and Team users will see o3, o4-mini, and o4-mini-high in the model selector starting today, replacing o1, o3â'mini, and o3â'miniâ'high.
Programming

You Should Still Learn To Code, Says GitHub CEO (businessinsider.com) 45

You should still learn to code, says GitHub's CEO. And you should start as soon as possible. From a report: "I strongly believe that every kid, every child, should learn coding," Thomas Dohmke said in a recent podcast interview with EO. "We should actually teach them coding in school, in the same way that we teach them physics and geography and literacy and math and what-not." Coding, he added, is one such fundamental skill -- and the only reason it's not part of the curriculum is because it took "us too long to actually realize that."

Dohmke, who's been a programmer since the 90s, said he's never seen "anything more exciting" than the current moment in engineering -- the advent of AI, he believes, has made the field that much easier to break into, and is poised to make software more ubiquitous than ever. "It's so much easier to get into software development. You can just write a prompt into Copilot or ChatGPT or similar tools, and it will likely write you a basic webpage, or a small application, a game in Python," Dohmke said. "And so, AI makes software development so much more accessible for anyone who wants to learn coding."

AI, Dohmke said, helps to "realize the dream" of bringing an idea to life, meaning that fewer projects will end up dead in the water, and smaller teams of developers will be enabled to tackle larger-scale projects. Dohmke said he believes it makes the overall process of creation more efficient. "You see some of the early signs of that, where very small startups -- sometimes five developers and some of them actually only one developer -- believe they can become million, if not billion dollar businesses by leveraging all the AI agents that are available to them," he added.

Programming

AI Models Still Struggle To Debug Software, Microsoft Study Shows (techcrunch.com) 43

Some of the best AI models today still struggle to resolve software bugs that wouldn't trip up experienced devs. TechCrunch: A new study from Microsoft Research, Microsoft's R&D division, reveals that models, including Anthropic's Claude 3.7 Sonnet and OpenAI's o3-mini, fail to debug many issues in a software development benchmark called SWE-bench Lite. The results are a sobering reminder that, despite bold pronouncements from companies like OpenAI, AI is still no match for human experts in domains such as coding.

The study's co-authors tested nine different models as the backbone for a "single prompt-based agent" that had access to a number of debugging tools, including a Python debugger. They tasked this agent with solving a curated set of 300 software debugging tasks from SWE-bench Lite.

According to the co-authors, even when equipped with stronger and more recent models, their agent rarely completed more than half of the debugging tasks successfully. Claude 3.7 Sonnet had the highest average success rate (48.4%), followed by OpenAI's o1 (30.2%), and o3-mini (22.1%).

AI

Open Source Coalition Announces 'Model-Signing' with Sigstore to Strengthen the ML Supply Chain (googleblog.com) 10

The advent of LLMs and machine learning-based applications "opened the door to a new wave of security threats," argues Google's security blog. (Including model and data poisoning, prompt injection, prompt leaking and prompt evasion.)

So as part of the Linux Foundation's nonprofit Open Source Security Foundation, and in partnership with NVIDIA and HiddenLayer, Google's Open Source Security Team on Friday announced the first stable model-signing library (hosted at PyPI.org), with digital signatures letting users verify that the model used by their application "is exactly the model that was created by the developers," according to a post on Google's security blog. [S]ince models are an uninspectable collection of weights (sometimes also with arbitrary code), an attacker can tamper with them and achieve significant impact to those using the models. Users, developers, and practitioners need to examine an important question during their risk assessment process: "can I trust this model?"

Since its launch, Google's Secure AI Framework (SAIF) has created guidance and technical solutions for creating AI applications that users can trust. A first step in achieving trust in the model is to permit users to verify its integrity and provenance, to prevent tampering across all processes from training to usage, via cryptographic signing... [T]he signature would have to be verified when the model gets uploaded to a model hub, when the model gets selected to be deployed into an application (embedded or via remote APIs) and when the model is used as an intermediary during another training run. Assuming the training infrastructure is trustworthy and not compromised, this approach guarantees that each model user can trust the model...

The average developer, however, would not want to manage keys and rotate them on compromise. These challenges are addressed by using Sigstore, a collection of tools and services that make code signing secure and easy. By binding an OpenID Connect token to a workload or developer identity, Sigstore alleviates the need to manage or rotate long-lived secrets. Furthermore, signing is made transparent so signatures over malicious artifacts could be audited in a public transparency log, by anyone. This ensures that split-view attacks are not possible, so any user would get the exact same model. These features are why we recommend Sigstore's signing mechanism as the default approach for signing ML models.

Today the OSS community is releasing the v1.0 stable version of our model signing library as a Python package supporting Sigstore and traditional signing methods. This model signing library is specialized to handle the sheer scale of ML models (which are usually much larger than traditional software components), and handles signing models represented as a directory tree. The package provides CLI utilities so that users can sign and verify model signatures for individual models. The package can also be used as a library which we plan to incorporate directly into model hub upload flows as well as into ML frameworks.

"We can view model signing as establishing the foundation of trust in the ML ecosystem..." the post concludes (adding "We envision extending this approach to also include datasets and other ML-related artifacts.") Then, we plan to build on top of signatures, towards fully tamper-proof metadata records, that can be read by both humans and machines. This has the potential to automate a significant fraction of the work needed to perform incident response in case of a compromise in the ML world...

To shape the future of building tamper-proof ML, join the Coalition for Secure AI, where we are planning to work on building the entire trust ecosystem together with the open source community. In collaboration with multiple industry partners, we are starting up a special interest group under CoSAI for defining the future of ML signing and including tamper-proof ML metadata, such as model cards and evaluation results.

Python

Python's PyPI Finally Gets Closer to Adding 'Organization Accounts' and SBOMs (mailchi.mp) 1

Back in 2023 Python's infrastructure director called it "the first step in our plan to build financial support and long-term sustainability of PyPI" while giving users "one of our most requested features: organization accounts." (That is, "self-managed teams with their own exclusive branded web addresses" to make their massive Python Package Index repository "easier to use for large community projects, organizations, or companies who manage multiple sub-teams and multiple packages.")

Nearly two years later, they've announced that they're "making progress" on its rollout... Over the last month, we have taken some more baby steps to onboard new Organizations, welcoming 61 new Community Organizations and our first 18 Company Organizations. We're still working to improve the review and approval process and hope to improve our processing speed over time. To date, we have 3,562 Community and 6,424 Company Organization requests to process in our backlog.
They've also onboarded a PyPI Support Specialist to provide "critical bandwidth to review the backlog of requests" and "free up staff engineering time to develop features to assist in that review." (And "we were finally able to finalize our Terms of Service document for PyPI," build the tooling necessary to notify users, and initiate the Terms of Service rollout. [Since launching 20 years ago PyPi's terms of service have only been updated twice.]

In other news the security developer-in-residence at the Python Software Foundation has been continuing work on a Software Bill-of-Materials (SBOM) as described in Python Enhancement Proposal #770. The feature "would designate a specific directory inside of Python package metadata (".dist-info/sboms") as a directory where build backends and other tools can store SBOM documents that describe components within the package beyond the top-level component." The goal of this project is to make bundled dependencies measurable by software analysis tools like vulnerability scanning, license compliance, and static analysis tools. Bundled dependencies are common for scientific computing and AI packages, but also generally in packages that use multiple programming languages like C, C++, Rust, and JavaScript. The PEP has been moved to Provisional Status, meaning the PEP sponsor is doing a final review before tools can begin implementing the PEP ahead of its final acceptance into changing Python packaging standards. Seth has begun implementing code that tools can use when adopting the PEP, such as a project which abstracts different Linux system package managers functionality to reverse a file path into the providing package metadata.

Security developer-in-residence Seth Larson will be speaking about this project at PyCon US 2025 in Pittsburgh, PA in a talk titled "Phantom Dependencies: is your requirements.txt haunted?"

Meanwhile InfoWorld reports that newly approved Python Enhancement Proposal 751 will also give Python a standard lock file format.
AI

Two Teenagers Built 'Cal AI', a Photo Calorie App With Over a Million Users (techcrunch.com) 24

An anonymous reader quotes a report from TechCrunch: In a world filled with "vibe coding," Zach Yadegari, teen founder of Cal AI, stands in ironic, old-fashioned contrast. Ironic because Yadegari and his co-founder, Henry Langmack, are both just 18 years old and still in high school. Yet their story, so far, is a classic. Launched in May, Cal AI has generated over 5 million downloads in eight months, Yadegari says. Better still, he tells TechCrunch that the customer retention rate is over 30% and that the app generated over $2 million in revenue last month. [...]

The concept is simple: Take a picture of the food you are about to consume, and let the app log calories and macros for you. It's not a unique idea. For instance, the big dog in calorie counting, MyFitnessPal, has its Meal Scan feature. Then there are apps like SnapCalorie, which was released in 2023 and created by the founder of Google Lens. Cal AI's advantage, perhaps, is that it was built wholly in the age of large image models. It uses models from Anthropic and OpenAI and RAG to improve accuracy and is trained on open source food calorie and image databases from sites like GitHub.

"We have found that different models are better with different foods," Yadegari tells TechCrunch. Along the way, the founders coded through technical problems like recognizing ingredients from food packages or in jumbled bowls. The result is an app that the creators say is 90% accurate, which appears to be good enough for many dieters.
The report says Yadegari began mastering Python and C# in middle school and went on to build his first business in ninth grade -- a website called Totally Science that gave students access to unblocked games (cleverly named to evade school filters). He sold the company at age 16 to FreezeNova for $100,000.

Following the sale, Yadegari immersed himself in the startup scene, watching Y Combinator videos and networking on X, where he met co-founder Blake Anderson, known for creating ChatGPT-powered apps like RizzGPT. Together, they launched Cal AI and moved to a hacker house in San Francisco to develop their prototype.
AI

MCP: the New 'USB-C For AI' That's Bringing Fierce Rivals Together (arstechnica.com) 30

An anonymous reader quotes a report from Ars Technica: What does it take to get OpenAI and Anthropic -- two competitors in the AI assistant market -- to get along? Despite a fundamental difference in direction that led Anthropic's founders to quit OpenAI in 2020 and later create the Claude AI assistant, a shared technical hurdle has now brought them together: How to easily connect their AI models to external data sources. The solution comes from Anthropic, which developed and released an open specification called Model Context Protocol (MCP) in November 2024. MCP establishes a royalty-free protocol that allows AI models to connect with outside data sources and services without requiring unique integrations for each service.

"Think of MCP as a USB-C port for AI applications," wrote Anthropic in MCP's documentation. The analogy is imperfect, but it represents the idea that, similar to how USB-C unified various cables and ports (with admittedly a debatable level of success), MCP aims to standardize how AI models connect to the infoscape around them. So far, MCP has also garnered interest from multiple tech companies in a rare show of cross-platform collaboration. For example, Microsoft has integrated MCP into its Azure OpenAI service, and as we mentioned above, Anthropic competitor OpenAI is on board. Last week, OpenAI acknowledged MCP in its Agents API documentation, with vocal support from the boss upstairs. "People love MCP and we are excited to add support across our products," wrote OpenAI CEO Sam Altman on X last Wednesday.

MCP has also rapidly begun to gain community support in recent months. For example, just browsing this list of over 300 open source servers shared on GitHub reveals growing interest in standardizing AI-to-tool connections. The collection spans diverse domains, including database connectors like PostgreSQL, MySQL, and vector databases; development tools that integrate with Git repositories and code editors; file system access for various storage platforms; knowledge retrieval systems for documents and websites; and specialized tools for finance, health care, and creative applications. Other notable examples include servers that connect AI models to home automation systems, real-time weather data, e-commerce platforms, and music streaming services. Some implementations allow AI assistants to interact with gaming engines, 3D modeling software, and IoT devices.

Cloud

Microsoft Announces 'Hyperlight Wasm': Speedy VM-Based Security at Scale with a WebAssembly Runtime (microsoft.com) 18

Cloud providers like the security of running things in virtual machines "at scale" — even though VMs "are not known for having fast cold starts or a small footprint..." noted Microsoft's Open Source blog last November. So Microsoft's Azure Core Upstream team built an open source Rust library called Hyperlight "to execute functions as fast as possible while isolating those functions within a VM."

But that was just the beginning... Then, we showed how to run Rust functions really, really fast, followed by using C to [securely] run Javascript. In February 2025, the Cloud Native Computing Foundation (CNCF) voted to onboard Hyperlight into their Sandbox program [for early-stage projects].

[This week] we're announcing the release of Hyperlight Wasm: a Hyperlight virtual machine "micro-guest" that can run wasm component workloads written in many programming languages...

Traditional virtual machines do a lot of work to be able to run programs. Not only do they have to load an entire operating system, they also boot up the virtual devices that the operating system depends on. Hyperlight is fast because it doesn't do that work; all it exposes to its VM guests is a linear slice of memory and a CPU. No virtual devices. No operating system. But this speed comes at the cost of compatibility. Chances are that your current production application expects a Linux operating system running on the x86-64 architecture (hardware), not a bare linear slice of memory...

[B]uilding Hyperlight with a WebAssembly runtime — wasmtime — enables any programming language to execute in a protected Hyperlight micro-VM without any prior knowledge of Hyperlight at all. As far as program authors are concerned, they're just compiling for the wasm32-wasip2 target... Executing workloads in the Hyperlight Wasm guest isn't just possible for compiled languages like C, Go, and Rust, but also for interpreted languages like Python, JavaScript, and C#. The trick here, much like with containers, is to also include a language runtime as part of the image... Programming languages, runtimes, application platforms, and cloud providers are all starting to offer rich experiences for WebAssembly out of the box. If we do things right, you will never need to think about whether your application is running inside of a Hyperlight Micro-VM in Azure. You may never know your workload is executing in a Hyperlight Micro VM. And that's a good thing.

While a traditional virtual-device-based VM takes about 125 milliseconds to load, "When the Hyperlight VMM creates a new VM, all it needs do to is create a new slice of memory and load the VM guest, which in turn loads the wasm workload. This takes about 1-2 milliseconds today, and work is happening to bring that number to be less than 1 millisecond in the future."

And there's also double security due to Wasmtime's software-defined runtime sandbox within Hyperlight's larger VM...
Python

Codon Python Compiler Gets Faster - and Changes to Apache 2 License (usenix.org) 4

Slashdot reader rikfarrow summarizes an article they wrote for Usenix.org about the Open Source Python compiler Codon: In 2023 I tried out Codon. At the time I had difficulty compiling the scripts I most commonly used, but was excited by the prospect. Python is essentially single threaded and checks the shape (type) of each variable as it interprets scripts. Codon fixes types and compiles Python into compact, executable binaries that execute much faster.

Several things have changed with their latest release: I have successful compiles, the committers have added a compiled version of NumPy (high performance math algorithms), and changed their open source license to Apache 2.

"The other big news is that Exaloop, the company that is behind Codon, has changed their license to Apache 2..." according to the article, so "commercial use and derivations of Codon are now permitted without licensing."
Piracy

Malicious PyPI Package Exploited Deezer's API, Orchestrates a Distributed Piracy Operation (socket.dev) 24

A malicious PyPi package effectively turned its users' systems "into an illicit network for facilitating bulk music downloads," writes The Hacker News.

Though the package has been removed from PyPI, researchers at security platform Socket.dev say it enabled "coordinated, unauthorized music downloads from Deezer — a popular streaming service founded in France in 2007." Although automslc, which has been downloaded over 100,000 times, purports to offer music automation and metadata retrieval, it covertly bypasses Deezer's access restrictions... The package is designed to log into Deezer, harvest track metadata, request full-length streaming URLs, and download complete audio files in clear violation of Deezer's API terms... [I]t orchestrates a distributed piracy operation by leveraging both user-supplied and hardcoded Deezer credentials to create sessions with Deezer's API. This approach enables full access to track metadata and the decryption tokens required to generate full-length track URLs.

Additionally, the package routinely communicates with a remote server... to update download statuses and submit metadata, thereby centralizing control and allowing the threat actor to monitor and coordinate the distributed downloading operation. In doing so, automslc exposes critical track details — including Deezer IDs, International Standard Recording Codes, track titles, and internal tokens like MD5_ORIGIN (a hash used in generating decryption URLs) — which, when collected en masse, can be used to reassemble full track URLs and facilitate unauthorized downloads...

Even if a user pays for access to the service, the content is licensed, not owned. The automslc package circumvents licensing restrictions by enabling downloads and potential redistribution, which is outside the bounds of fair use...

"The malicious package was initially published in 2019, and its popularity (over 100,000 downloads) indicates wide distribution..."
AI

27-Year-Old EXE Became Python In Minutes. Is AI-Assisted Reverse Engineering Next? (adafruit.com) 150

Adafruit managing director Phillip Torrone (also long-time Slashdot reader ptorrone) shared an interesting blog post. They'd spotted a Reddit post "detailing how someone took a 27-year-old visual basic EXE file, fed it to Claude 3.7, and watched as it reverse-engineered the program and rewrote it in Python." It was an old Visual Basic 4 program they had written in 1997. Running a VB4 exe in 2024 can be a real yak-shaving compatibility nightmare, chasing down outdated DLLs and messy workarounds. So! OP decided to upload the exe to Claude 3.7 with this request:

"Can you tell me how to get this file running? It'd be nice to convert it to Python.">

Claude 3.7 analyzed the binary, extracted the VB 'tokens' (VB is not a fully-machine-code-compiled language which makes this task a lot easier than something from C/C++), identified UI elements, and even extracted sound files. Then, it generated a complete Python equivalent using Pygame. According to the author, the code worked on the first try and the entire process took less than five minutes...

Torrone speculates on what this might mean. "Old business applications and games could be modernized without needing the original source code... Tools like Claude might make decompilation and software archaeology a lot easier: proprietary binaries from dead platforms could get a new life in open-source too."

And maybe Archive.org could even add an LLM "to do this on the fly!"

Slashdot Top Deals