PyPi is responsible for distributing malware

Installing packages from PyPi is a dangerous proposition. In July 2020, the request package (note the typo: request and not requests) was downloaded over 10,000 times. It was a malware that installed a daemon in .bashrc, providing the attacker with a remote shell on the machine. PyPi has always been vulnerable to typosquatting attacks but does not display any warning to their users. A system administrator with a healthy discipline with regard to security hosted the “request” malware during 20 days in August 2020 and discovered it by chance. How many machines are still compromised? Why is there no information on PyPi about this malware and all others? Even though there is every reason to suspect such attacks happen on a weekly basis and target hundreds of thousands of users, is is almost never discussed in public.

When contacted about a malicious package the python security team or the people responsible for PyPi only answer is that you can’t trust anything on PyPI. There however is no such warning on https://pypi.org and no documentation on how to verify that a package can be trusted. If PyPi did not already know about the malicious package, they will remove it. But there will be no security warning about which malicious package was published or when. For instance a dozen malicious packages were discovered in 2018 and removed from PyPi, but there is no way to know when they were published and for how long.

Consequently as a PyPi user who wants to avoid malware, I’m responsible:

  • to not make typos when running pip install somepackage, even for test purposes
  • to verify it does not contain a malware (there is no documentation on that topic)
  • to keep track of all packages installed for forensic analysis (because there is no public database of known malware distributed via PyPi)

This is a very heavy burden.

The Debian GNU/Linux distribution maintains a software repository which is similar to PyPi in the sense that it contains packages. With one major difference: it is efficiently protected against malware. All packages are signed by their authors and the signature can be verified with a web of trust when they are installed.

As a Debian GNU/Linux user, my responsibility to avoid malware is:

  • nothing

Ideally PyPi will provide a similar security feature in the future but there currently is no work in this direction (note that PIP 458 Secure PyPI downloads with signed repository metadata is being worked on but is not about package authors signing their packages). As a result downloading a package from PyPi is either very time consuming (because it needs to be carefully analyzed) or a security risk because it can contain a malware for which no information is ever going to be released on PyPi, even after it is discovered.

This is a large scale problem because PyPi is implicitly trusted by hundreds of thousands of users who are not informed about the risks they take. Over the past decade people working on PyPi and organizations financing their work repeatedly made a conscious choice to work on something other than protecting users from malware. It may be OK to do so occasionally but after such an extended period of time it becomes a pattern. PyPi is responsible for distributing malware and urgently needs to re-adjust its priorities to resolve the problem.