Show HN: Pipask – safer pip without compromising convenience

52 points by Feynmanix 2 months ago

Pipask is a drop-in replacement for pip that addresses a serious security flaw: standard pip executes arbitrary code from source distributions during dependency resolution, without warning or consent.

Pipask retrieves metadata through PyPI's JSON API first, then checks repository popularity, download counts, package age, and known vulnerabilities before allowing installation. It presents you with a pretty report and asks for you consent with installation, giving you control over what code runs on your system.

More details in the intro blog post: https://medium.com/data-science-collective/pipask-know-what-...

the__alchemist 2 months ago

I speculate that the group of users that are both A: Willing to install something beyond Pip for package management, and B: aren't willing to install a proper dependency resolver (uv, etc) is small.

If you are willing to use a third-party package tool (big leap), I think it's a small step to use one that fixes all of pip's limitations, vice a single one.

Feynmanix 2 months ago

Perhaps it's not clear from my description above, but I'm afraid the flaw is in the Python package ecosystem itself rather than pip. I'm not very familiar with uv, but from what I can tell from the documentation, it needs to execute the same steps as pip to resolve metadata, as this is required by various PEPs. (You can have a look at the diagram in the linked blog post https://medium.com/data-science-collective/pipask-know-what-...).
But I also get your point - advanced users who care about security may not be using pip. Implementing the functionality as a plugin for uv or poetry is actually the next step I'm considering, if people find the concept of pipask useful. What do you think?
- WhatIsDukkha 2 months ago
  
  I simply wouldn't use this as is but I would like it if it was a uv plugin, poetry seems like a dead end in 2025 to me.
- sneak 2 months ago
  
  Why don’t we update pypi to require publishing dependency metadata along with packages, so that the deps can be resolved without running code?
  
  zahlman 2 months ago
  
  Dependency metadata is published within packages - both for wheels (prebuilt) and sdists (source that requires a build step - these projects often include non-Python code, but you can still request an entirely pointless build step for a Python-only project. But please read https://pradyunsg.me/blog/2022/12/31/wheels-are-faster-pure-... and don't do that.)
  PyPI even extracts it in most cases, to my understanding, so that installers can solve for the right versions of dependencies without downloading entire packages. (But the metadata for the project is still fragmented across multiple per-distribution files.)
  However, source distributions are still allowed to have metadata - including dependencies - marked as "dynamic" (i.e., declaring that they'll be determined during the build process). This is rarely necessary (and probably happens much more often than is necessary), but a very complex project might for example have different dependencies based on specific details of the user's environment that aren't currently expressible in the existing environment markers (see https://peps.python.org/pep-0508/).
  My experience with the Python ecosystem has been that it tries a bit too hard to make absolutely everyone happy (despite all the times that quite a few people end up unhappy). Today's concessions to backward compatibility always seem to make tomorrow's even harder to implement.
  
  pseufaux 2 months ago
  
  Isn't this what pyproject.toml solves? Genuine question as I am blissfully unaware of the intricacies dependency resolution.
  
  zahlman 2 months ago
  
  Short version, although you already got an answer:
  If everyone had to use it, and everyone were only allowed to use "static" dependencies determined ahead of time, yes. But:
  * legacy projects that don't use pyproject.toml are still supported
  * it's possible to publish an "sdist" source package that's built on the user's machine (for example, because it includes C code that's highly machine specific and needs to be custom built for some reason; or because the user wants to build it locally in order to link against large, locally available libraries instead of using a massive wheel that copies them)
  * When something is built locally, it's permissible to determine the dependencies during that build process (and in some rare cases, that may be another reason why an sdist gets used - the user's environment needs to be inspected in order to figure out what dependencies to fetch)
  * Even if it did work, `pyproject.toml` is really more like "source code" for the metadata (about dependencies and other things). The real metadata is a file called `PKG-INFO` when it appears in an sdist, or `METADATA` in a wheel. The format is based on email headers (yes, really).
  
  Feynmanix 2 months ago
  
  Have a look at the diagram in the accompanying blog post https://medium.com/data-science-collective/pipask-know-what-... , it explains how the process works.
  In short, you can get metadata from pyproject.toml, but (a) it can still involve executing code due to PEP 517 hooks, and (b) a malicious package would use the legacy setup.py to get their code executed.
  
  pseufaux 2 months ago
  
  That's a super helpful diagram. Saved it for later in case I have to explain to someone else. Thank you. I can see why something like pipask would be helpful. I saw in another comment that you are looking to make a uv plugin. I'll be on the lookout for that getting released!
zahlman 2 months ago

Pip is "a proper dependency resolver". It just perhaps doesn't have the best heuristics for performance, but they're working on that.
What pip isn't is a workflow tool for developers, or "project manager" (terminology uv uses in its marketing; "package manager" seems to be not well enough defined to argue about). Pip doesn't install Python, create or manage virtual environments (or real ones for the separately installed Python versions), upload your packages to PyPI, view a lockfile as a way to keep track of an environment (although they have just added experimental support for creating PEP 751 lockfiles and are planning support for installing from them), do one-off runs of Python applications (by installing them in an ephemeral environment, possibly installing PEP 723 inline-specified dependencies), define or manage "workspaces", have its own `[tool.pip]` section in pyproject.toml, or possibly other things I forgot.
But it absolutely does determine the dependencies of the packages you're currently asking to install, transitively, attempt to figure out a compatible set of versions, and figure out which actual build artifacts to use (i.e., "resolve dependencies"). Its logic for doing so has even been extracted and made available as the `resolvelib` package (https://pypi.org/project/resolvelib/).
My own project, PAPER, is scoped to fix pip's problems and also do basic environment management (so as to install applications or do temporary runs). The point is to satisfy the needs of Python users, while not explicitly catering to developers. (I'll probably separately offer some useful developer scripts that leverage the functionality.)
I also, incidentally, intend to allow for this sort of prompt during the install procedure. (Although the planned default is to refuse sdists entirely.)
- Feynmanix 2 months ago
  
  Do you have a link where I can learn more about PAPER?
  
  zahlman 2 months ago
  
  Ah, I misread your OP - the only part I really planned to support is "this package requires building from source, and may run arbitrary code now (not just after installation) to do so; are you sure you want to proceed?". Although the other stuff certainly seems worthwhile - I think it would be easier to plug into my design than into pip. Especially since I'm explicitly creating an internal API first and wrapping a separate CLI package around that.
  The project is still in quite early development stages, and not yet really usable for anything - I've been struggling to make myself sit down and implement even simple things, just personal issues. But the repository is at https://github.com/zahlman/paper and I am planning a Show HN when it seems appropriate. Hopefully the existing code at least gives an idea of the top-level design. I also described it a bit in https://news.ycombinator.com/item?id=43825508 .
  I've also written a few posts on my blog (https://zahlman.github.io) about design problems with pip, and going forward I'll be continuing that, along with giving a basic overview of the PAPER design and explaining how it addresses the problems I've identified.
happytoexplain 2 months ago

Isn't pip third party? Which makes your point stronger.
- zahlman 2 months ago
  
  Yes and no.
  Pip is nominally developed by separate people and isn't part of the standard library. However, it does ship with Python by default (Debian-based Linux distributions go out of their way to remove it), in the form of a wheel vendored within the standard library folders. The standard library module `ensurepip` is used to install that wheel - it bootstraps Pip's own code from within that wheel. This is also used indirectly by default when you create a new venv with the standard library `venv`.
  (The reason uv can create environments quickly is that it skips that part, while otherwise following nominally the same logic. You can get the same effect by passing `--without-pip` to the `python -m venv` invocation, and it's actually faster (on my machine at least) than using uv. However, you then need to understand how to use pip cross-environment (it wasn't designed for that from the start, but modern pip offers support that's only a little bit buggy). I discuss this on my blog in https://zahlman.github.io/posts/2025/01/07/python-packaging-... .)

scsh 2 months ago

I like the idea of having vuln reporting in the installation step. Looking at the examples provided though, I think the vulnerability reporting could use a bit more information.

Using the fastapi example, it points to CVE-2024-24762 which, if you're looking at the NIST or CVE pages for it, doesn't give the clearest info for how to resolve.

Maybe consider linking to advisories in the Python Packaging Advisory Database when possible, like pip-audit does. https://osv.dev/vulnerability/PYSEC-2024-38 is a lot clearer that fastapi is affected and which version fixed the vulnerability.

Feynmanix 2 months ago

It's not visible on the screenshot for some reason, but if you run the latest version, you'll notice a little underline under the CVE mention. It's actually a hyperlink (Cmd+click in iTerm2) that leads to https://osv.dev/vulnerability/CVE-2024-24762 where you can find out more.
Or are you saying you'd rather it leads to https://osv.dev/vulnerability/PYSEC-2024-38 rather than https://osv.dev/vulnerability/CVE-2024-24762 ?
- scsh 2 months ago
  
  Yes in this particular case, where I'm trying to install fastapi, I'd rather it direct me to https://osv.dev/vulnerability/PYSEC-2024-38 which is more fastapi specific and mentions that the fixed version of fast api is 0.109.1. Or even better, give the link and print the fixed version from the advisory yaml https://github.com/pypa/advisory-database/blob/main/vulns/fa...
  
  Feynmanix 2 months ago
  
  I'll have a look at that
- simonw 2 months ago
  
  Can it spit out a visible URL for those of us who use the default macOS terminal app?
  
  Feynmanix 2 months ago
  
  Yes, I can! Will be in the next release

omneity 2 months ago

This looks great, congrats on the release! So what are the potential risks/downsides?

As in what’s the tradeoff that is being made when relying on pipask?

Feynmanix 2 months ago

Thanks! Good question. I think the main downsides are:
- installation takes a few more seconds to do the checks
- you need to trust me, a random person from the internet
- if there are any subtle differences between pip versions, the checks may be done for different versions than will be actually installed (I've done my best to prevent this for pip versions 22.2 to current latest), or if I missed any bugs, you may get an error you wouldn't get with pip
The current version is also interactive only - requires user confirmation, though I'm open to adding a non-interactive mode in the future.
- omneity 2 months ago
  
  Thank you for the answers. Sounds reasonable in my book (what's one more internet stranger to trust, hah).
  I'll give it a try!

zahlman 2 months ago

So, this is a fork of pip that adds the described checks to the UI? Looks like it also doesn't vendor dependencies like pip does, which is probably fine - you'll have to use something like (like pip) to bootstrap it, but this doesn't have the special-case requirements that motivate that design choice for pip (like being bootstrapped via standard library `ensurepip`).

Feynmanix 2 months ago

Yes, the reason I had to fork pip was that the dependency resolution logic is too complex and I couldn't recreate it from scratch with fidelity.
You're right I don't vendor dependencies, and I hope to get away with it exactly because I don't have the bootstrapping problem. In practice, you want to install pipask with pipx so that the dependencies don't mess with your local environment.

roywiggins 2 months ago

Seems like a natural fit for plugging into vibe coding tools that would otherwise cheerfully pip install whatever.

https://www.techradar.com/pro/security/ai-hallucinated-names...

Feynmanix 2 months ago

Great point! If you alias pip to pipask in your .*rc file, than this should already work out of the box for some tools, but there may be problems such as the need for non-interactive flows and configuring failure thresholds.
I'll think about this use case more!
- roywiggins 2 months ago
  
  It occurs to me that if people are executing pip over requirements.txt outputs it would work and be very helpful, but if they're giving LLM agents shell access directly probably the main problem is going to be finding a way for pipask to try to confirm that it's talking to a human and not just the LLM again (impossible in general but still)... probably out of scope though!
- omneity 2 months ago
  
  Seconding this use case! I see pipask helping greatly as another safety net when using coding agents.

ashishbijlani 2 months ago

Plug: I've been building a similar tool: https://github.com/ossillate-inc/packj

Packj uses static+dynamic code/behavioral analysis to scan for indicators of compromise (e.g., spawning of shell, use of SSH keys, network communication, use of decode+eval, etc). It also checks for several metadata attributes to detect impersonating packages (typo squatting).

Feynmanix 2 months ago

Thanks, I'll have a look, possibly add a link to it

ATechGuy 2 months ago

Looks like a useful tool. Congrats on shipping! Many packages are installed automatically in environments like CI/CD pipelines or Dockerfiles, where interactive review and consent aren't possible. How do you plan to handle such scenarios?

Feynmanix 2 months ago

Ideally, you should use lockfiles for your CI/CD or docker. To create or update the lockfile, a developer needs to install dependencies manually first (as in `pip install X` -> `pip freeze`), at which point the checks would be executed and the user would consent.
That said, it's pretty uncommon to use lockfiles with pip, so I'm considering creating something like a plugin for poetry or uv, if there is demand?
- zahlman 2 months ago
  
  Quite a few people use requirements.txt files with pip actually. I've seen many projects that even expect end users to do so. You might not notice - exactly because they aren't packaging for PyPI.
  
  Feynmanix 2 months ago
  
  But before committing requirements.txt to git, they still run install locally, right?
  
  zahlman 2 months ago
  
  Sure, they presumably have a local dev environment where they install dependencies to test their own code.
  But there are a lot of possible workflows around that. Some people might separately install things one at a time according to what they appear to need as they're developing, and then use `pip freeze` to create the `requirements.txt` file. Others might edit `requirements.txt` directly, and repeatedly re-create their environment based off that. Still others might involve any number of tools here, such as pip-tools (https://pypi.org/project/pip-tools/), pipenv (https://pypi.org/project/pipenv/), etc.
  
  Feynmanix 2 months ago
  
  As long as they run `pip install` locally at any point in their process before pushing to the repo, they should get the opportunity to see the pipask report.
  
  zahlman 2 months ago
  
  True. I was only trying to address "it's pretty uncommon to use lockfiles with pip". I should have quoted it in my first post.