×
AI

Will Productivity Gains from AI-Generated Code Be Offset by the Need to Maintain and Review It? (zdnet.com) 95

ZDNet asks the million-dollar question. "Despite the potential for vast productivity gains from generative AI tools such as ChatGPT or GitHub Copilot, will technology professionals' jobs actually grow more complicated? " People can now pump out code on demand in an abundance of languages, from Java to Python, along with helpful recommendations. Already, 95% of developers in a recent survey from Sourcegraph report they use Copilot, ChatGPT, and other gen AI tools this way.

But auto-generating new code only addresses part of the problem in enterprises that already maintain unwieldy codebases, and require high levels of cohesion, accountability, and security.

For starters, security and quality assurance tasks associated with software jobs aren't going to go away anytime soon. "For programmers and software engineers, ChatGPT and other large language models help create code in almost any language," says Andy Thurai, analyst with Constellation Research, before talking about security concerns. "However, most of the code that is generated is security-vulnerable and might not pass enterprise-grade code. So, while AI can help accelerate coding, care should be taken to analyze the code, find vulnerabilities, and fix it, which would take away some of the productivity increase that AI vendors tout about."

Then there's code sprawl. An analogy to the rollout of generative AI in coding is the introduction of cloud computing, which seemed to simplify application acquisition when first rolled out, and now means a tangle of services to be managed. The relative ease of generating code via AI will contribute to an ever-expanding codebase — what the Sourcegraph survey authors refer to as "Big Code". A majority of the 500 developers in the survey are concerned about managing all this new code, along with code sprawl, and its contribution to technical debt. Even before generative AI, close to eight in 10 say their codebase grew five times over the last three years, and a similar number struggle with understanding existing code generated by others.

So, the productivity prospects for generative AI in programming are a mixed bag.

Programming

Google's Bard AI Can Now Write and Execute Code To Answer a Question 19

In a blog post on Wednesday, Google said Bard is getting better at logic and reasoning. "Google says that now when you ask Bard a 'computational' task like math or string manipulation, instead of showing the output of the language model, that language model will instead write a program, execute that program, and then show the output of that program to the user as an answer," reports Ars Technica. From the report: Google's blog post provides the example input of "Reverse the word 'Lollipop' for me." ChatGPT flubs this question and provides the incorrect answer "pillopoL," because language models see the world in chunks of words, or "tokens," and they just aren't good at this. It gets the output correct as "popilloL," but more interesting is that it also includes the python code it wrote to answer the question. That's neat for programming-minded people to see under the hood, but wow, is that probably the scariest output ever for regular people. It's also not particularly relevant. Imagine if Gmail showed you a block of code when you just asked it to fetch email. It's weird. Just do the job you were asked to do, Bard.

Google likens an AI model writing a program to humans doing long division in that it's a different mode of thinking [...]. Google says this "writing code on the fly" method will also be used for questions like: "What are the prime factors of 15683615?" and "Calculate the growth rate of my savings." The company says, "So far, we've seen this method improve the accuracy of Bard's responses to computation-based word and math problems in our internal challenge datasets by approximately 30%." As usual, Google warns Bard "might not get it right" due to interpreting your question wrong or just, like all of us, writing code that doesn't work the first time. Bard is coding up answers on the fly right now if you want to give it a shot at bard.google.com.
Programming

Stanford Golf Phenom Rose Zhang Turns Pro, Vows To 'Never Code Again' 75

theodp writes: Golf reports that amateur golf legend Rose Zhang will compete for the first time as a professional when she tees off in the first round of the Mizuho Americas Open Thursday. Golf news is rarely fodder for Slashdot discussion, but when the 20-year-old Stanford student (who plans to complete her degree after a leave of absence) was asked by Golf to identify her toughest class, she threw CS under the bus.

"CS 106A," Zhang replied, referring to a computer science course. "Currently and still trying to grind in that class. It's been a little unfortunate for me. I'm not a CS major. Will never code again after this class." Back in April, Zhang expressed some doubts about being able to juggle the demands of an already-renowned golf career and CS 106A. "I'll be super, super busy," Zhang said in an interview. "I'm planning on taking CS 106A. I don't know if it's a smart decision but it's kind of an essential intro CS class into Stanford so I'm going to try to navigate that, balance that out."

The Stanford Daily reports that CS 106A: Programming Methodology is an introductory programming course taken by 1,600+ students from all academic disciplines each year (2015 Slashdot post on CS 106A's growing pains). According to the syllabus, CS 106A "uses the Python programming language" and there's "no prior programming experience required," although the schedule indicates a lot of ground is covered for someone new to coding (the same could be said of Harvard's famed CS50).

Lest some take Zhang to task for the sin of stating programming is hard, consider that Stanford's CS 106A website suggests the same, reporting that the median score on the midterm exam was only 68%, despite a plethora of review materials and sessions. CS 106A students were offered the chance to submit formal 'regrade requests' to try to improve their midterm scores and can also vie for "a Jamba Juice gift card and 100% on the final exam" by entering a Python programming contest -- one prize will be awarded for "Aesthetic merit", another for "Algorithmic sophistication" (a number of runners-up will be awarded "a grade boost similar to getting a + on one of their assignments").
Python

PyPi is Reducing Stored IP Address Data (theregister.com) 10

The PyPi registry of open source Python packages "began evaluating ways to reduce the amount of identifying information that it stores," reports the Register, "even before the U.S. Justice Department came asking for data on suspect users."

But now, "the Python community package registry wants developers to understand that it's working to minimize the user data that it stores." The goal is not to be unable to respond to lawful requests for information; rather it's to store only the minimum amount of data necessary so as not to expose users to unnecessary privacy intrusion. Coincidentally, data minimization may prevent organizations from becoming a preferred source of on-demand surveillance: having excessive amounts of information about users invites legal demands, which staff then have to handle...

Mike Fiedler, a member of the PyPI admin team, said in a statement on Friday that the organization's effort to improve user privacy and security dates back to 2020. Since the receipt of the subpoenas in March and April, that effort has been reinvigorated.

Much of the concern focuses on IP address data, which gets stored in conjunction with web log access; user events such as logins; project events including uploads; events associated with recently introduced organizations; and administrative PyPI journal entries. According to Fiedler, PyPI was able to stop storing IP data for journal entries — an append-only transaction log — because these were only exposed to administrators... To obscure IP addresses, PyPI is salting them — adding an arbitrary value — and then hashing them — running the data through a one-way scrambling function that creates a value called a hash. This provides a way to store a reference to potentially identifying data without actually storing raw data... PyPI has been using its CDN provider Fastly to pass along a salted hash of the IP address for requests via a custom header, along with broad GeoIP data (the country and city where the user is located), and is using that instead of the raw IP address. In April, the registry adopted code changes for hashing and salting IP addresses for requests that PyPI handles directly in Warehouse, the web application that implements the official Python package index.

And over the past few days, it has been replacing IP addresses in the PyPI user interface with geolocation data. PyPI still relies on IP address information to identify abuse — the creation of malicious packages, harassments, and so on — but Fiedler says even that is being looked at. "We're thinking about how to manage that without storing IP data, but we're not there yet," he said. Fiedler says the PyPI team will be weighing whether it can remove IP data from event history records after a period of time and whether the service can handle all its requests via CDN.

Python

Python 3.12 Brings New Features and Fixes (infoworld.com) 30

"The Python programming language releases new versions yearly, with a feature-locked beta release in the first half of the year and the final release toward the end of the year," writes InfoWorld.

So now Python 3.12 beta 1 has just been released, and InfoWorld compiled a list of its most significant new features. Some highlights: - The widely used Linux profiler tool perf works with Python, but only returns information about what's happening at the C level in the Python runtime. Information about actual Python program functions doesn't show up. Python 3.12 enables an opt-in mode to allow perf to harvest details about Python programs...

- Programs can run as much as an order of magnitude slower when run through a debugger or profiler. PEP 669 provides hooks for code object events that profilers and debuggers can attach to, such as the start or end of a function. A callback function could be registered by a tool to fire whenever such an event is triggered. There will still be a performance hit for profiling or debugging, but it'll be greatly reduced...

- Comprehensions, a syntax that lets you quickly construct lists, dictionaries, and sets, are now constructed "inline" rather than by way of temporary objects. The speedup for this has been clocked at around 11% for a real-world case and up to twice as fast for a micro-benchmark.

- Python's type-hinting syntax, added in Python 3.5, allows linting tools to catch a wide variety of errors ahead of time. With each new version, typing in Python gains features to cover a broader and more granular range of use cases... The type parameter syntax provides a cleaner way to specify types in a generic class, function, or type alias...

- Every object in Python has a reference count that tracks how many times other objects refer to it, including built-in objects like None. PEP 683 allows objects to be treated as "immortal," so that they never have their reference count changed. Making objects immortal has other powerful implications for Python in the long run. It makes it easier to implement multicore scaling, and to implement other optimizations (like avoiding copy-on-write) that would have been hard to implement before.

- With earlier versions of Python, the base size of an object was 208 bytes. Objects have been refactored multiple times over the last few versions of Python to make them smaller, which doesn't just allow more objects to live in memory but helps with cache locality. As of Python 3.12, the base size of an object is now 96 bytes — less than half of what it used to be.

Python

PyPI Was Subpoenaed 31

The PyPI blog: In March and April 2023, the Python Software Foundation (PSF) received three (3) subpoenas for PyPI user data. All three subpoenas were issued by the United States Department of Justice. The PSF was not provided with context on the legal circumstances surrounding these subpoenas. In total, user data related to five (5) PyPI usernames were requested. The data request was:

"Names (including subscriber names, user names, and screen names);"
"Addresses (including mailing, residential addresses, business addresses, and email addresses);"
"Connection records;"
"Records of session times and durations, and the temporarily assigned network address (such as Internet Protocol addresses) associated with those sessions;"
"Length of service (including start date) and type of services utilized;"
"Telephone or instrument numbers (including the registration Internet Protocol address);"
"Means and source of payment of any such services (including any credit card or bank account number) and billing records;"
"Records of all Python Package Index (PyPI) packages uploaded by..." given usernames
"IP download logs of any Python Package Index (PyPI) packages uploaded by..." given usernames

The privacy of PyPI users is of utmost concern to PSF and the PyPI Administrators, and we are committed to protecting user data from disclosure whenever possible. In this case, however, PSF determined with the advice of counsel that our only course of action was to provide the requested data. I, as Director of Infrastructure of the Python Software Foundation, fulfilled the requests in consultation with PSF's counsel.

We have waited for the string of subpoenas to subside, though we were committed from the beginning to write and publish this post as a matter of transparency, and as allowed by the lack of a non-disclosure order associated with the subpoenas received in March and April 2023.
Python

Python's PyPi Package Repository Temporarily Halted New Signups, Citing 'Volume of Malicious Projects' (bleepingcomputer.com) 24

On Saturday PyPI, the official third-party registry of open source Python packages, "temporarily suspended new users from signing up, and new projects from being uploaded to the platform" reports BleepingComputer.

"The volume of malicious users and malicious projects being created on the index in the past week has outpaced our ability to respond to it in a timely fashion, especially with multiple PyPI administrators on leave," stated an incident notice posted by PyPI admins Saturday.

Hours ago they posted a four-word update: "Suspension has been lifted." No details were provided, but The Hacker News writes the incident "comes as software registries such as PyPI have proven time and time again to be a popular target for attackers looking to poison the software supply chain and compromise developer environments." Earlier this week, Israeli cybersecurity startup Phylum uncovered an active malware campaign that leverages OpenAI ChatGPT-themed lures to bait developers into downloading a malicious Python module capable of stealing clipboard content in order to hijack cryptocurrency transactions. ReversingLabs, in a similar discovery, identified multiple npm packages named nodejs-encrypt-agent and nodejs-cookie-proxy-agent in the npm repository that drops a trojan called TurkoRat.
AI

Google Colab Promises 'AI-Powered Coding, Free of Charge' (blog.google) 24

Google Colab hosts free cloud-based "executable documents" that, among other things, let you write and run code in your browser (in dozens of languages, including Python).

Over 7 million people, including students, already use Colab, according to a recent post on Google's blog, "and now it's getting even better with advances in AI [with] features like code completions, natural language to code generation and even a code-assisting chatbot."

Google says it will "dramatically increase programming speed, quality, and comprehension." Our first features will focus on code generation. Natural language to code generation helps you generate larger blocks of code, writing whole functions from comments or prompts. [For example: "import data.csv as a dataframe."] The goal here is to reduce the need for writing repetitive code, so you can focus on the more interesting parts of programming and data science. Eligible users in Colab will see a new "Generate" button in their notebooks, allowing them to enter any text prompt to generate code.

For eligible paid users, as you type, you'll see autocomplete suggestions.

We're also bringing the helpfulness of a chatbot directly into Colab. Soon, you'll be able to ask questions directly in Colab like, "How do I import data from Google Sheets?" or "How do I filter a Pandas DataFrame?"

Anyone with an internet connection can access Colab, and use it free of charge... Access to these features will roll out gradually in the coming months, starting with our paid subscribers in the U.S. and then expanding into the free-of-charge tier.

It's powered by Google's "next generation" machine-learning language model PaLM 2 (announced earlier this month), which "excels at popular programming languages like Python and JavaScript, but can also generate specialized code in languages like Prolog, Fortran and Verilog." Colab will use Codey, a family of code models built on PaLM 2... fine-tuned on a large dataset of high quality, permissively licensed code from external sources to improve performance on coding tasks. Plus, the versions of Codey being used to power Colab have been customized especially for Python and for Colab-specific uses.
Programming

'Mojo May Be the Biggest Programming Language Advance In Decades' (www.fast.ai) 126

Mojo is a new programming language developed by Modular1 that aims to address the performance and deployment limitations of Python in areas like AI model development. After demoing Mojo prior to its launch, Jeremy Howard from the non-profit research group fast.ai said it feels like coding will never be the same again. Here's an excerpt from Howard's article: Modular is a fairly small startup that's only a year old, and only one part of the company is working on the Mojo language. Mojo development was only started recently. It's a small team, working for a short time, so how have they done so much? The key is that Mojo builds on some really powerful foundations. Very few software projects I've seen spend enough time building the right foundations, and tend to accrue as a result mounds of technical debt. Over time, it becomes harder and harder to add features and fix bugs. In a well designed system, however, every feature is easier to add than the last one, is faster, and has fewer bugs, because the foundations each feature builds upon are getting better and better. Mojo is a well designed system.

At its core is MLIR (Multi-Level Intermediate Representation), which has already been developed for many years, initially kicked off by Chris Lattner at Google. He had recognized what the core foundations for an "AI era programming language" would need, and focused on building them. MLIR was a key piece. Just as LLVM made it dramatically easier for powerful new programming languages to be developed over the last decade (such as Rust, Julia, and Swift, which are all based on LLVM), MLIR provides an even more powerful core to languages that are built on it. Another key enabler of Mojo's rapid development is the decision to use Python as the syntax. Developing and iterating on syntax is one of the most error-prone, complex, and controversial parts of the development of a language. By simply outsourcing that to an existing language (which also happens to be the most widely used language today) that whole piece disappears! The relatively small number of new bits of syntax needed on top of Python then largely fit quite naturally, since the base is already in place.

The next step was to create a minimal Pythonic way to call MLIR directly. That wasn't a big job at all, but it was all that was needed to then create all of Mojo on top of that -- and work directly in Mojo for everything else. That meant that the Mojo devs were able to "dog-food" Mojo when writing Mojo, nearly from the very start. Any time they found something didn't quite work great as they developed Mojo, they could add a needed feature to Mojo itself to make it easier for them to develop the next bit of Mojo!
You can give Mojo a try here.
Google

Google Announces PaLM 2, Its Next Generation Language Model (blog.google) 6

Google, in a blog post: PaLM 2 is a state-of-the-art language model with improved multilingual, reasoning and coding capabilities.

Multilinguality: PaLM 2 [PDF] is more heavily trained on multilingual text, spanning more than 100 languages. This has significantly improved its ability to understand, generate and translate nuanced text -- including idioms, poems and riddles -- across a wide variety of languages, a hard problem to solve. PaLM 2 also passes advanced language proficiency exams at the "mastery" level.
Reasoning: PaLM 2's wide-ranging dataset includes scientific papers and web pages that contain mathematical expressions. As a result, it demonstrates improved capabilities in logic, common sense reasoning, and mathematics.
Coding: PaLM 2 was pre-trained on a large quantity of publicly available source code datasets. This means that it excels at popular programming languages like Python and JavaScript, but can also generate specialized code in languages like Prolog, Fortran and Verilog.

Even as PaLM 2 is more capable, it's also faster and more efficient than previous models -- and it comes in a variety of sizes, which makes it easy to deploy for a wide range of use cases. We'll be making PaLM 2 available in four sizes from smallest to largest: Gecko, Otter, Bison and Unicorn. Gecko is so lightweight that it can work on mobile devices and is fast enough for great interactive applications on-device, even when offline. This versatility means PaLM 2 can be fine-tuned to support entire classes of products in more ways, to help more people.

At I/O today, we announced over 25 new products and features powered by PaLM 2. That means that PaLM 2 is bringing the latest in advanced AI capabilities directly into our products and to people -- including consumers, developers, and enterprises of all sizes around the world. Here are some examples:

PaLM 2's improved multilingual capabilities are allowing us to expand Bard to new languages, starting today. Plus, it's powering our recently announced coding update.
Workspace features to help you write in Gmail and Google Docs, and help you organize in Google Sheets are all tapping into the capabilities of PaLM 2 at a speed that helps people get work done better, and faster.
Med-PaLM 2, trained by our health research teams with medical knowledge, can answer questions and summarize insights from a variety of dense medical texts. It achieves state-of-the-art results in medical competency, and was the first large language model to perform at "expert" level on U.S. Medical Licensing Exam-style questions. We're now adding multimodal capabilities to synthesize information like x-rays and mammograms to one day improve patient outcomes. Med-PaLM 2 will open up to a small group of Cloud customers for feedback later this summer to identify safe, helpful use cases.

Google

Google Drops Waitlist for AI Chatbot Bard, Expands To Over 180 Countries (theverge.com) 26

Google is adding a smorgasbord of new features to its AI chatbot Bard, including support for new languages (Japanese and Korean), easier ways to export text to Google Docs and Gmail, visual search, and a dark mode. Most significantly, the company is removing the waitlist for Bard and making the system available in English in 180 countries and territories. From a report: It's also promising future features like AI image generation powered by Adobe and integration with third-party web services like Instacart and OpenTable. Collectively, the news is a shot in the arm for Bard, which was released two months ago for select users in the US and UK. The chatbot -- which Google still stresses is an experiment and not a replacement to its search engine -- has compared poorly to rivals like OpenAI's ChatGPT and Microsoft's new Bing chatbot. Notably, Bard made a factual error in its first-ever public demo (though this problem is common to all such bots). Now, Google is adding a lot of new features as well as upgrading Bard to use its new PaLM 2 language model. This should improve its general answers and usability.

Google says the upgraded Bard is particularly good at tackling coding queries, including debugging and explaining chunks of code in more than 20 languages, so some of today's upgrades are focused on this use case. These include the new dark mode, improved citations for code (which will not only offer sources but also explain the snippets), and a new export button. This can already be used to send code to Google's Colab platform but will now also work with another browser-based IDE, Replit (starting with Python queries).

Python

Codon Compiler For Python Is Fast - but With Some Caveats (usenix.org) 36

For 16 years, Rik Farrow has been an editor for the long-running nonprofit Usenix. He's also been a consultant for 43 years (according to his biography at Usenix.org) — and even wrote the 1988 book Unix System Security: How to Protect Your Data and Prevent Intruders.

Today Farrow stopped by Slashdot to share his thoughts on Codon. rikfarrow writes: Researchers at MIT decided to build a compiler focused on speeding up genomics processing... Recently, they have posted their code on GitHub, and I gave it a test drive.
"Managed" languages produce code for a specific runtime (like JavaScript). Now Farrow's article at Usenix.org argues that Codon produces code "much faster than other managed languages, and in some cases faster than C/C++."

Codon-compiled code is faster because "it's compiled, variables are typed at compile time, and it supports parallel execution." But there's some important caveats: The "version of Python" part is actually an important point: the builders of Codon have built a compiler that accepts a large portion of Python, including all of the most commonly used parts — but not all... Duck typing means that the Codon compiler uses hints found in the source or attempts to deduce them to determine the correct type, and assigns that as a static type. If you wanted to process data where the type is unknown before execution, this may not work for you, although Codon does support a union type that is a possible workaround. In most cases of processing large data sets, the types are known in advance so this is not an issue...

Codon is not the same as Python, in that the developers have not yet implemented all the features you would find in Python 3.10, and this, along with duck typing, will likely cause problems if you just try and compile existing scripts. I quickly ran into problems, as I uncovered unsupported bits of Python, and, by looking at the Issues section of their Github pages, so have other people.

Codon supports a JIT feature, so that instead of attempting to compile complete scripts, you can just add a @codon.jit decorator to functions that you think would benefit from being compiled or executed in parallel, becoming much faster to execute...

Whether your projects will benefit from experimenting with Codon will mean taking the time to read the documentation. Codon is not exactly like Python. For example, there's support for Nvidia GPUs included as well and I ran into a limitation when using a dictionary. I suspect that some potential users will appreciate that Codon takes Python as input and produces executables, making the distribution of code simpler while avoiding disclosure of the source. Codon, with its LLVM backend, also seems like a great solution for people wanting to use Python for embedded projects.

My uses of Python are much simpler: I can process millions of lines of nginx logs in seconds, so a reduction in execution time means little to me. I do think there will be others who can take full advantage of Codon.

Farrow's article also points out that Codon "must be licensed for commercial use, but versions older than three years convert to an Apache license. Non-commercial users are welcome to experiment with Codon."
Programming

Swift Creator's Company Builds New Programming Language 'Mojo' - a Python Superset (www.fast.ai) 82

While working at Apple, Chris Lattner designed Swift to "fully leverage the power of LLVM," and "led a team for a while at Google to try to move Swift out of its Apple comfort zone, to become a replacement for Python in AI model development." That's according to a blog post by Jeremy Howard, an advisor to Lattner's Modular AI (which he co-founded in 2022 to build a next-generation AI platform for developers).

"But sadly," Howard writes, Swift "did not receive the support it needed from either Apple or from Google, and it was not ultimately successful." And yet... [W]hilst at Google Chris did develop another project which became hugely successful: MLIR. MLIR is a replacement for LLVM's intermediate representation [or IR] for the modern age of many-core computing and AI workloads. It's critical for fully leveraging the power of hardware like GPUs, TPUs, and the vector units increasingly being added to server-class CPUs.

So, if Swift was "syntax sugar for LLVM", what's "syntax sugar for MLIR"? The answer is: Mojo! Mojo is a brand new language that's designed to take full advantage of MLIR. And also Mojo is Python.

Wait what?

OK let me explain. Maybe it's better to say Mojo is Python++. It will be (when complete) a strict superset of the Python language. But it also has additional functionality so we can write high performance code that takes advantage of modern accelerators...

Whereas Swift was a brand new language packing all kinds of cool features based on latest research in programming language design, Mojo is, at its heart, just Python. This seems wise, not just because Python is already well understood by millions of coders, but also because after decades of use its capabilities and limitations are now well understood. Relying on the latest programming language research is pretty cool, but its potentially-dangerous speculation because you never really know how things will turn out...

A key trick in Mojo is that you can opt in at any time to a faster "mode" as a developer, by using "fn" instead of "def" to create your function. In this mode, you have to declare exactly what the type of every variable is, and as a result Mojo can create optimised machine code to implement your function. Furthermore, if you use "struct" instead of "class", your attributes will be tightly packed into memory, such that they can even be used in data structures without chasing pointers around. These are the kinds of features that allow languages like C to be so fast, and now they're accessible to Python programmers too — just by learning a tiny bit of new syntax...

I can't begin to describe all the little (and big!) ideas throughout Mojo's design and implementation — it's the result of Chris and his team's decades of work on compiler and language design and includes all the tricks and hard-won experience from that time — but what I can describe is an amazing result that I saw with my own eyes.

Mojo hasn't been released to the public yet, (other than an online "playground" with a waitlist where they're "rolling out access slowly.") But the blog post notes that creating a programming language's syntax is usually complex, error-prone, and controversial — a problem Mojo neatly avoids by "outsourcing" its syntax to an existing language, "which also happens to be the most widely used language today."

And "As a compiled language, Mojo's deployment story is basically the same as C," the post argues. [That is, "you can literally just make the compiled program available for direct download. It can be just 100k or so in size, and will launch and run quickly."]

"This means that Mojo is far more than a language for AI/ML applications. It's actually a version of Python that allows us to write fast, small, easily-deployed applications that take advantage of all available cores and accelerators!"
Python

'Faster, Leaner' Python 3.12 Released Today with Improvements to Speed, Multiprocessing (infoworld.com) 53

Python 3.12 was released today, with improvements to speed and efficiency, reports InfoWorld. Core developers explained the improvements at this year's PyCon convention in Salt Lake City, Utah, including efforts to reduce Python's memory use, make the interpreter faster, and optimize compilation for more efficient code: Subinterpreters is a mechanism where the Python runtime can have multiple interpreters running together inside a single process, as opposed to each interpreter being isolated in its own process (the current multiprocessing mechanism)... While subinterpreters have been available in the Python runtime for some time now, they haven't had an interface for the end user. Also, the messy state of Python's internals hasn't allowed subinterperters to be used effectively. With Python 3.12, core python developer Eric Snow and his cohort cleaned up Python's internals enough to make subinterpreters useful, and they are adding a minimal module to the Python standard library called interpreters. This gives programmers a rudimentary way to launch subinterpreters and execute code on them.

Snow's own initial experiments with subinterpreters significantly outperformed threading and multiprocessing. One example, a simple web service that performed some CPU-bound work, maxed out at 100 requests per second with threads, and 600 with multiprocessing. But with subinterpreters, it yielded 11,500 requests, and with little to no drop-off when scaled up from one client. The interpreters module has very limited functionality right now, and it lacks robust mechanisms for sharing state between subinterpreters. But Snow believes by Python 3.13 a good deal more functionality will appear, and in the interim developers are encouraged to experiment...

Python 3.11 introduced new bytecodes to the interpreter, called adaptive instructions. These instructions can be replaced automatically at runtime with versions specialized for a given Python type, a process called quickening. This saves the interpreter the step of having to look up what types the objects are, speeding up the whole process enormously. For instance, if a given addition operation regularly takes in two integers, that instruction can be replaced with one that assumes the operands are both integers... Python 3.12 has more adaptive specialization opcodes...

And starting with Python 3.12, object headers now use 96 bytes, which InfoWorld reports is "slightly less than half of what it was before."
AI

Nvidia Releases a Toolkit To Make Text-Generating AI 'Safer' (techcrunch.com) 53

An anonymous reader quotes a report from TechCrunch: In pursuit of "safer" text-generating models, Nvidia today released NeMo Guardrails, an open source toolkit aimed at making AI-powered apps more "accurate, appropriate, on topic and secure." Jonathan Cohen, the VP of applied research at Nvidia, says the company has been working on Guardrails' underlying system for "many years" but just about a year ago realized it was a good fit for models along the lines of GPT-4 and ChatGPT. "We've been developing toward this release of NeMo Guardrails ever since," Cohen told TechCrunch via email. "AI model safety tools are critical to deploying models for enterprise use cases."

Guardrails includes code, examples and documentation to "add safety" to AI apps that generate text as well as speech. Nvidia claims that the toolkit is designed to work with most generative language models, allowing developers to create rules using a few lines of code. Specifically, Guardrails can be used to prevent -- or at least attempt to prevent -- models from veering off topic, responding with inaccurate information or toxic language and making connections to "unsafe" external sources. Think keeping a customer service assistant from answering questions about the weather, for instance, or a search engine chatbot from linking to disreputable academic journals. "Ultimately, developers control what is out of bounds for their application with Guardrails," Cohen said. "They may develop guardrails that are too broad or, conversely, too narrow for their use case."

While companies like Zapier are using Guardrails to add a layer of safety to their generative models, Nvidia acknowledges that the toolkit isn't imperfect; it won't catch everything, in other words. Cohen also notes that Guardrails works best with models that are "sufficiently good at instruction-following," a la ChatGPT, and that use the popular LangChain framework for building AI-powered apps. That disqualifies some of the open source options out there. And -- effectiveness of the tech aside -- it must be emphasized that Nvidia isn't necessarily releasing Guardrails out of the goodness of its heart. It's a part of the company's NeMo framework, which is available through Nvidia's enterprise AI software suite and its NeMo fully managed cloud service. Any company can implement the open source release of Guardrails, but Nvidia would surely prefer that they pay for the hosted version instead.

Open Source

Python's PyPI Will Sell 'Organization Accounts' to Corporate Projects to Fund Staff (pypi.org) 14

Last year Python's massive PyPI repository of pre-written software packages had 235.7 billion downloads — a 57% annual growth in its download counts and bandwidth. So now Python's nonprofit Python Software Foundation has an announcement.

Their director of infrastructure said today that they're rolling out "the first step in our plan to build financial support and long-term sustainability of PyPI, while simultaneously giving our users one of our most requested features: organization accounts." Organizations on PyPI are self-managed teams, with their own exclusive branded web addresses. Our goal is to make PyPI easier to use for large community projects, organizations, or companies who manage multiple sub-teams and multiple packages.

We're making organizations available to community projects for free, forever, and to corporate projects for a small fee. Additional priority support agreements will be available to all paid subscribers, and all revenue will go right back into PyPI to continue building better support and infrastructure for all our users... Having more people using and contributing to Python every year is an fantastic problem to have, but it is one we must increase organizational capacity to accommodate. Increased revenue for PyPI allows it to become a staffed platform that can respond to support requests and attend to issues in a timeframe that is significantly faster than what our excellent (but thinly spread) largely volunteer team could reasonably handle.

We want to be very clear — these new features are completely optional. If features for larger projects don't sound like something that would be useful to you as a PyPI maintainer, then there is no obligation to create an organization and absolutely nothing about your PyPI experience will change for you.

We look forward to discussing what other features PyPI users would like to see tackled next...

Google

Google's Bard AI Chatbot Can Now Help You Code and Create Functions For Google Sheets (theverge.com) 18

Google is updating its Bard AI chatbot to help developers write and debug code. Rivals like ChatGPT and Bing AI have supported code generation, but Google says it has been "one of the top requests" it has received since opening up access to Bard last month. From a report: Bard can now generate code, debug existing code, help explain lines of code, and even write functions for Google Sheets. "We're launching these capabilities in more than 20 programming languages including C++, Go, Java, Javascript, Python and Typescript," explains Paige Bailey, group product manager for Google Research, in a blog post. You can ask Bard to explain code snippets or explain code within GitHub repos similar to how Microsoft-owned GitHub is implementing a ChatGPT-like assistant with Copilot. Bard will also debug code that you supply or even its own code if it made some errors or the output wasn't what you were looking for.
Programming

Undercutting Microsoft, Amazon Offers Free Access to Its AI Coding Assistant 'CodeWhisperer' (theverge.com) 45

Amazon is making its AI-powered coding assistant CodeWhisperer free for individual developers, reports the Verge, "undercutting the $10 per month pricing of its Microsoft-made rival." Amazon launched CodeWhisperer as a preview last year, which developers can use within various integrated development environments (IDEs), like Visual Studio Code, to generate lines of code based on a text-based prompt....

CodeWhisperer automatically filters out any code suggestions that are potentially biased or unfair and flags any code that's similar to open-source training data. It also comes with security scanning features that can identify vulnerabilities within a developer's code, while providing suggestions to help close any security gaps it uncovers. CodeWhisperer now supports several languages, including Python, Java, JavaScript, TypeScript, and C#, including Go, Rust, PHP, Ruby, Kotlin, C, C++, Shell scripting, SQL, and Scala.

Here's how Amazon's senior developer advocate pitched the usefulness of their "real-time AI coding companion": Helping to keep developers in their flow is increasingly important as, facing increasing time pressure to get their work done, developers are often forced to break that flow to turn to an internet search, sites such as StackOverflow, or their colleagues for help in completing tasks. While this can help them obtain the starter code they need, it's disruptive as they've had to leave their IDE environment to search or ask questions in a forum or find and ask a colleague — further adding to the disruption. Instead, CodeWhisperer meets developers where they are most productive, providing recommendations in real time as they write code or comments in their IDE. During the preview we ran a productivity challenge, and participants who used CodeWhisperer were 27% more likely to complete tasks successfully and did so an average of 57% faster than those who didn't use CodeWhisperer....

It provides additional data for suggestions — for example, the repository URL and license — when code similar to training data is generated, helping lower the risk of using the code and enabling developers to reuse it with confidence.

EU

Python Foundation Raises Concerns Over EU's Proposed Cybersecurity Rules (theregister.com) 40

The Python Software Foundation is "concerned that proposed EU cybersecurity laws will leave open source organizations and individuals unfairly liable for distributing incorrect code," according to the Register. The PSF reviewed the EU's proposed "Cyber Resilience Act" and "Product Liability Act" and reports "issues that put the mission of our organization and the health of the open-source software community at risk."

From the Register's report: "If the proposed law is enforced as currently written, the authors of open-source components might bear legal and financial responsibility for the way their components are applied in someone else's commercial product," the PSF said in a statement shared on Tuesday by executive director Deb Nicholson. "The existing language makes no differentiation between independent authors who have never been paid for the supply of software and corporate tech behemoths selling products in exchange for payments from end-users...."

The PSF argues the EU lawmakers should provide clear exemptions for public software repositories that serve the public good and for organizations and developers hosting packages on public repositories. "We need it to be crystal clear who is on the hook for both the assurances and the accountability that software consumers deserve," the PSF concludes. The PSF is asking anyone who shares its concerns to convey that sentiment to an appropriate EU Member of Parliament by April 26, while amendments focused on protecting open source software are being considered.

Bradley Kuhn, policy fellow at the Software Freedom Conservancy, told The Register that the free and open source (FOSS) community should think carefully about the scope of the exemptions being sought. "I'm worried that many in FOSS are falling into a trap that for-profit companies have been trying to lay for us on this issue," he said. "While it seems on the surface that a blanket exception for FOSS would be a good thing for FOSS, in fact, this an attempt for companies to get the FOSS community to help them skirt their ordinary product liability. For profit companies that deploy FOSS should have the same obligations for security and certainty for their users as proprietary software companies do."

The article points out that numerous tech organizations are urging clarifications in the proposed regulations, including NLnet Labs and the Eclipse Foundation.
Security

Google's Free Assured Open Source Software Service Hits General Availability (techcrunch.com) 24

An anonymous reader shares a report: About a year ago, Google announced its Assured Open Source Software (Assured OSS) service, a service that helps developers defend against supply chain security attacks by regularly scanning and analyzing some of the world's most popular software libraries for vulnerabilities. Today, Google is launching Assured OSS into general availability with support for well over a thousand Java and Python packages -- and while Google didn't initially disclose pricing when it first announced the service, the company has now revealed that it will be available for free.

Software development has long depended on third-party libraries (which are often maintained by only a single developer), but it wasn't until the industry got hit with a number of high-profile exploits that everyone (including the White House) perked up and started taking software supply chain security seriously. Now, you can't attend an open source conference without hearing about Software Bills of Materials (SBOMs), artifact registries and similar topics. It's no surprise then that Google, which has long been at the forefront of releasing open-source products, launched a service like Assured OSS.

Google promises that it will constantly keep these libraries up to date (without creating forks) and continuously scan for known vulnerabilities, do fuzz tests to discover new ones and then fix these issues and contribute these fixes back upstream. The company notes that when it first launched the service with around 250 Java libraries, it was responsible for discovering 48% of the new CVEs for these libraries and subsequently addressing them.

Slashdot Top Deals