Maybe Pythons import/module system makes sense to someone but its caused me a lot of trouble. I'm not even sure exactly why but I run into weird issues often.
Like the fact that the imports are tied to the working directory things are expected to be run from is just very weird. There should just be one project root and all imports can then import via full paths from the root or paths relative to the file's location. Tying it to working directory is incredibly confusing. The same imports can work when run from one directory and fail from another..
In a project I have I need to run it like `python3 -m part2.week2` instead of `python3 part2/week2.py` or the imports will fail. I'll probably never understand this system :)
Part of it might just be some implicit complexity in module systems as a whole that I don't get, not just python. Rust has some confusing situations too, although Java and Go seem to have more straightforward systems.
>Part of it might just be some implicit complexity in module systems as a whole that I don't get, not just python.
I won't speak about other languages, but Python's system is indeed very complex - you just aren't usually exposed to most of the complexity, because it mainly takes the form of optional hooks. (Thanks for the reminder that I need to do a blog post about this some time - once I've figured out how to structure everything I want to say.)
js2 gave you the how-to; so here's the detailed explanation. You might not want to try to tackle it all at once.
Importantly, Python modules don't necessarily map one-to-one to `.py` files. They can be `import`ed from a precompiled `.pyc` bytecode file, which could be either in the searched folder directly or in a `__pycache__` subdirectory (the modern default). Or they can be loaded from a `.py` file within an archive, where `sys.path` contains path and filename of the archive rather than just a folder path. Or they can come from within the Python interpreter executable itself - and that's not just for the builtins. Or they can be anything else that ends up creating a `module` object, as long as you write the necessary custom importer.
Aside from the well-known `sys.path`, there is a `sys.meta_path` which contains objects that represent ways to import modules:
Python 3.12.4 (main, Jun 24 2024, 03:28:13) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.meta_path
[<class '_frozen_importlib.BuiltinImporter'>, <class '_frozen_importlib.FrozenImporter'>, <class '_frozen_importlib_external.PathFinder'>]
(IIRC: the `PathFinder` delegates to a separate loader which may be either a .py file loader, .pyc file loader, .zip archive loader etc. The `FrozenImporter` is for the non-"builtin" modules within the interpreter.)
>the fact that the imports are tied to the working directory things are expected to be run from
It doesn't actually work that way. It is tied to sys.path - which, by default, is initialized to include the working directory or an appropriate equivalent. See https://docs.python.org/3/library/sys_path_init.html for details.
>There should just be one project root
And what about when you want to import things that aren't within the project? How will Python know whether it should look within the project or somewhere else?
>and all imports can then import via full paths from the root or paths relative to the file's location.
Aside from the fact that "path" might not be as meaningful as you expect for every import, this is inconvenient for people who like src layout. The point of the dotted-path notation is that you get to use a symbolic name for the module which reflects the logical rather than physical structure of your project. Plus, you get to use relative imports without being limited by the clunky relative-path syntax. (It's important not to have that limitation, because any given package can be split across different folders in unrelated locations. In fact, that's the precise purpose of the "namespace package" feature discussed in TFA, and in more detail in the PEP (https://peps.python.org/pep-0420/ .)
If you really mean that any valid file path should work, then the "one project root" no longer has meaning, and also it sounds like you're now on the hook for knowing the absolute path to the standard library (and to the `site-packages` directory for third-party libraries). As I'm sure you can imagine, that's not great for portability.
Many people seem to think they want an `import` statement that uses file paths instead. Generally they don't understand what they'd be giving up.
If you want it to work both ways, then you have to define the package structure semantics when a path is used, so that modules imported both ways can properly interoperate. It's not a simple task with a clear and unambiguous specification - if it were, someone would have implemented it already.
It is possible to import "dynamically" by directly telling the import system, at runtime, what file to use (you can tell it what loader to use, too). See for example https://stackoverflow.com/questions/67631.
>In a project I have I need to run it like `python3 -m part2.week2` instead of `python3 part2/week2.py` or the imports will fail. I'll probably never understand this system :)
There are many details in the system, but the top-level understanding is quite simple. `-m` means to run the code as a module, i.e., to figure out what package it's in (by importing its ancestors first) and to set its `__name__` to the actual name of the module (so that code inside `if __name__ == '__main__':` will not run). Since you're specifying a qualified module name (i.e., the package is a namespace), you use the dot notation. Since you're not specifying a path, Python uses its own internal logic to decide where the code is and what form it takes. Since you're "importing" under the hood, Python will leave a bytecode cache behind by default.
Giving a path to a file means to run the code as a script - i.e., Python still does the usual file I/O, bytecode compilation if not cached, and execution of the top-level code; but it sets `__name__` to the special value `'__main__'` (so that "guarded" code does run), and it doesn't import any parent packages (because there aren't any) and thus relative imports can't work. Since you're specifying a path, you're directly telling Python where the code file is; Python only uses its internal logic to determine how to load it. Since this file is a "script" that represents the conceptual top level of whatever you're doing, it's not treated as an import, and so Python doesn't create a bytecode cache.
> -m` means to run the code as a module, i.e., to figure out what package it's in (by importing its ancestors first) and to set its `__name__` to the actual name of the module (so that code inside `if __name__ == '__main__':` will not run).
This is only partially true. The code will be executed as a module in some senses, but importantly `__name__` will be set to `__main__` for this module. This is what enables e.g. `python -m http.server` to have a different effect to `import http.server`.
I also disagree with some other parts of this as well. For example, if you were okay with breaking changes, it's quite easy to differentiate between local imports (any import path that begins with `.`) and imports from the package store (all other imports). This is how imports are generally handled in Javascript, for example. Looking for the package root also doesn't have to be that complicated if you have a well-known project structure or a well-known metadata file that can specify what the project should look like. Again, look at Javascript and the `package.json` file, where all of the module entry points can be defined explicitly.
I understand why Python doesn't work like this, and why it probably never can at this point, but that doesn't make it a good system, and it doesn't mean that the quirks are there for a good reason. IIRC, there's some quite clever logic built into the import system just so that you can import a third-party Flask plugin from `flask.plugins.*`, which is cool but also completely unnecessary complexity that makes the whole system more difficult to comprehend for almost no benefit.
If there were ever to be a Python 4, I wish it would focus solely on the necessary breaking changes to get Python packaging back into a manageable state: explicit and mandatory metadata, one fully-featured blessed package manager (and not the anemic, it-just-installs-things approach of pip, but a genuine package manager with lock files, workspaces, proper dependency groups, etc), and clear, obvious imports that can be explained to an everyday user of Python without them having to resort to badly-curated SO answers.
Thanks for the informal invitation to write even more on one of my favourite topics. ;)
>but importantly `__name__` will be set to `__main__` for this module
Correct. Should have tested that, sorry - I don't use the feature often.
>It's quite easy to differentiate between local imports (any import path that begins with `.`) and imports from the package store (all other imports). This is how imports are generally handled in Javascript, for example.
JavaScript generally runs in a fundamentally different environment, and it has the luxury of letting the environment define these semantics (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...). But the problem I was trying to describe was more of a design problem. As far as I know, JavaScript doesn't have to worry about the hierarchical package structure thing. When you use an import for a name like `foo.bar.baz`, it's easy to see the package structure. If you import a path like, I don't know, `file://./foo/bar/baz.py`, you have to decide whether `foo` and `bar` are packages or just containing folders (the problem that the original article is about). You also have to decide whether that's relative to the current file or the project root (hopefully not the current working directory!) You have to decide whether you're forbidding absolute paths for "security" or allowing them for "convenience", and you realistically need to support support the symbolic-name syntax as well anyway. And you either restrict the user to know the actual file type source, or have to come up with some pseudo-path thing that allows for discovering other sources.
>I understand why Python doesn't work like this, and why it probably never can at this point, but that doesn't make it a good system
There are definitely things I don't like about this system, but it demonstrably solves the important problems, and can be worked with more straightforwardly (and understood more readily) than a lot of users seem to think. There's a lot of history to this - see e.g. https://stackoverflow.com/a/75068706/523612 - which indeed has a lot to do with "why it probably never can", but I think a bigger issue is that Python doesn't actually expect users to create a "project", and thus there isn't necessarily a good place to put a marker file. Also, "project root" and "package root" are different concepts, especially if you use src layout (as I do nowadays).
>If there were ever to be a Python 4, I wish it would focus solely on the necessary breaking changes to get Python packaging back into a manageable state
I used to think along these lines, too. But nowadays, the idea of a greenfield, Python-inspired language project makes a lot more sense to me. I expect to be blogging a fair bit about my own design in 2025.
That said, it seems pretty clear that the Python community sees the language and the packaging ecosystem as more or less separate concerns that can be worked on separately. (Notwithstanding the special status of Pip - whereby it's still distributed with Python, bootstrapped by other standard library modules and commonly expected to be copied into every environment, even though it's separately developed and versioned, doesn't have a formal programmatic API and is perfectly capable of installing into environments other than its host.) The current attitude is still that there will never be a Python 4 - "Python 3 is the brand", PEP 2026 proposes dropping the pretense and making it use real calver, etc. But I don't think that means they'll never drop support for legacy builds. (Of course, they'll never drop support for `setup.py` - because that's Setuptools' responsibility now; PEP 517 allows the build backend to make its own design decisions, and building non-Python code and integrating it with Python is not at all straightforward. But they can eventually require `pyproject.toml`.)
>explicit and mandatory metadata
Yes, this is embarrassing. They've supposedly been working on it, and I've been following the conversation in some detail. But the community's over-accommodating attitude towards backwards compatibility is a serious handicap here. I have a special "celebration" planned for the official Pip 24.3 release, which should be soon (apparently .dev0 is available).
>one fully-featured blessed package manager (and not the anemic, it-just-installs-things approach of pip, but a genuine package manager...
A lot of people say this and I haven't understood it. To me, it doesn't make a difference whether the various tasks a "package manager" does are separate applications or separate subcommands of the same application. The "just-installs-packages" thing is actually pretty complex already (it still includes dependency resolution, for example, and at least in Python it includes orchestrating local builds). And for the rest of the packaging loop, there is poor agreement on what the expected set of tasks is. The competing third-party options are competing for a reason.
I have a lot more thoughts about this topic - not yet very organized ones.
>clear, obvious imports that can be explained to an everyday user of Python without them having to resort to badly-curated SO answers.
Honestly, I blame SO for this (saying this as an SO curator with a special interest in the topic). The old Q&A is full of randomly-scoped Qs that covered the individual needs of some random individuals, rather than coherently dividing up the problem space. The answers are very often just bad despite huge upvote totals - partly because things have changed over time, partly because some myths are extremely persistent, and partly because everyone wants to share their personal experiences.
At the time a lot of this content was being written, SO hadn't really fully established its "sense of self" yet - there doesn't appear to have been a core community on the meta site that really understood what the format is capable of, what Stack Exchange sites ought to be, or how to achieve those goals. (Try digging into the history of how the standard reasons for closing questions have changed over time.) It took until 2014 or so, when new question volume was at its peak, for people to get frustrated enough to actually start defining those things properly.
Anyway, people still ask lots of "new" questions about Python that should have a straightforward answer and be recognized as common duplicates - but it's hell finding proper duplicate targets for them. There is maddeningly little community will to do anything about it, even though the problem is well recognized (https://meta.stackoverflow.com/questions/423857). It's a big part of why I'm trying to start over on Codidact.
> Python's system is indeed very complex - you just aren't usually exposed to most of the complexity
Indeed, I've been using Python since 1.5.2 as pretty much my daily language (I code in a lot of other languages too, but Python in my strong suit) and I've never noticed sys.meta_path despite it having been there for over 20 years:
__init__.py isn't exactly optional. The article sort of covers it, but if you don't include it, you are creating a namespace package, which is not the same as a regular package.
These use the `mir` namespace package to group separate packages. It's quite helpful as Python does not otherwise provide namespaces for the import namespace like Java or Go do.
But likewise, you don't want to do that for regular packages as you're opening yourself up to getting clobbered by a similarly named package.
It's somewhat comparable to PHONY in Makefiles. You could omit them. But you really shouldn't.
>But likewise, you don't want to do that for regular packages as you're opening yourself up to getting clobbered by a similarly named package.
(I'm guessing the example is your own code? Also, I guess you meant "regular" in the ordinary English sense, but in the terminology of PEP 420, a "regular package" has `__init__.py` files by definition.)
In principle, yes - you could get clobbered by such a package, in a different location on the system, that nevertheless uses the same name and also happens to be in `sys.path` for some reason. In principle, `__init__.py` files protect against this.
But in practice, you would get clobbered anyway, because of how Pip works:
Only one `mir` folder is created, and Pip blindly writes files from both wheels into that folder. Although namespace packages enable Python to split the package across separate folder hierarchies, that functionality isn't actually depended on. I can create and install my own conflicting wheel, and it will happily overwrite (by default! Without warning!) existing files, add `__init__.py` to the base `mir` folder, etc.
> Now, it’s immediately clear that component_a and component_b are packages, and services is just a directory. It also makes clear that “scripts” isn’t a package at all, and my_script.py isn’t something you should be importing. __init__.py files help developers understand the structure of your codebase.
The problem with this theory is that the presence of an __init__.py in a new project I've never seen before doesn't necessarily mean anything—it could just have been sprinkled superstitiously because no one really understands what exactly these files are for.
Then later—once I'm familiar enough with the project to know that the developers are using the marker file strictly to indicate modules—I'm already familiar with the directory structure and the marker files aren't doing anything for me.
I'd rather see people in the situation above make the module structure clear in the actual folder hierarchy and naming decisions than have them sprinkle __init__.py and count on readers to understand they were being intentional and not superstitious.
>The problem with this theory is that the presence of an __init__.py in a new project I've never seen before doesn't necessarily mean anything—it could just have been sprinkled superstitiously because no one really understands what exactly these files are for.
This wouldn't be a problem, and `__init__.py` files would unambiguously have the intended meaning, if cargo-cult programmers didn't get to overwhelm the explanation of the import system in early Stack Overflow questions on the topic (and create a huge mess of awkwardly-overlapping Q&As that are popular but don't work well as duplicate targets).
Just the title of this article conveys information that many Python programmers seem completely unaware of. The overwhelming majority of recommendations for `sys.path` hacks are completely unnecessary. Major real-world projects like Tensorflow can span hundreds of kloc without using them in the main codebase (you might see one or two uses to help Pytest or Sphinx along).
>I'd rather see people in the situation above make the module structure clear in the actual folder hierarchy and naming decisions
There are many ways to do this, and people like having a standard. Also, as TFA explains, tooling can automatically detect the presence of `__init__.py` files and doesn't have to assign any special meaning to your naming decisions.
There's nothing better than verbosity on large projects, especially in rarely-touched areas (build/test/deploy infra).
The perfect anti-pattern example is sbt. It shortened commands to unintelligible codes, even though you need to write it maybe once per year.
And by the way, you never write full lines anyway, autocomplete knows what to write using just 2-3 keystrokes, even without any AI. So verbosity doesn't affect writing speed anyway.
The problem is that packages can also look very different. For example, Python can import an entire package hierarchy from a zip file. (This is essential to how `zipapp` works; and under limited circumstances, a wheel file can also be used this way - which plays an essential role in how Pip is bootstrapped.)
__init__.py distinguishes (for the import machinery as well as human readers) standard packages from implicit namespace packages, both of which "look like" (and are!) packages.
They are "optional" because support was added for implicit namespace packages, but those have different semantics (particularly, placing another thing that resolves as a namespace package with the same name elsewhere on sys.path adds to the same package; as a result, the import machinery can't stop scanning sys.path when it finds a namespace package, it has to continue scanninng to see if there are more parts to add or if there is a standard package that takes precedence, so leaving off __init__.py for what is intended as a standard package, as well as opening up potentially incorrect behavior, adds a startup cost.)
Also treating non-boolean objects with a boolean meaning. Python is fine and all, but the Zen of Python has always been a joke that the language doesn't bother to follow.
It's called "the Zen of Python", and not e.g. "the laws of Python", for a reason. Not everyone is, as Tim Peters put it, Dutch.
(Although I should also say: PEP 20 reflects a 20-year-old understanding of Python's design paradigm and of the lessons learned from developing it, and Tim Peters was highly deferential to Guido van Rossum when publishing it. In fact, GvR was invited to add the supposedly missing 20th principle, but never did - IMHO, this refusal is very much in the Zen spirit. Anyway, the language - and standard library - have changed a lot since then.)
Thr post is overrated. The files are optional. I use them only if I need to put something inside them. As for imports, they work fine. And as for static analyzers, they work for me, as in I don't work for them.
Fundamentally it's an argument about style, which tries to make some default assumptions about a programmer's needs - so of course some people won't find it very convincing. FWIW, I'm in your camp for the most part, but I do very much appreciate the other view.
Yes, it goes in every "regular package" folder (a folder without one creates a "namespace package" - or rather, a "portion" thereof; this allows for a parallel file hierarchy somewhere else that holds another part of the same package).
Python represents packages as `module` objects - the exact same type, not even a subclass. These are created to represent either the folder or an `__init__.py` file.
The file need not be empty and in fact works like any other module, with the exception of some special meaning given to certain attributes (in particular, `__all__`, which controls the behaviour of star-imports). Notably, you can use this to force modules and subpackages within the package to load when the package is imported, even if the user doesn't request them (`collections.abc` does this); and you can make some other module appear to be within the package by assigning it as an attribute (`os.path` works this way; when `os` is imported, some implementation module such as `ntpath` or `posixpath` is chosen and assigned).
I don't know why didn't PEP 420 make one file at root of the package that describe the structure of all the sub-directories, rather than going the controversial, lazy route to remove __init__.py. That way you get both the explicitness and avoid the litering of empty marking files.
This is spelled out in the PEP (I’m the author). There isn’t one “portion” (a term defined in the PEP) to own that file. And it’s also not a requirement that all portions be installed in the same directory. I’m sorry you feel this was lazy, it was a lot of work spanning multiple attempts over multiple years.
JavaScript tooling requires index files for everything, which makes development slow, particularly when you want to iterate fast or create many files with single output.
I think it makes sense to make the compiler or script loader to rely on just the file and their contents. Either way you're already defining everything, why create an additional redundant set of definitions.
> JavaScript tooling requires index files for everything
This just isn't true. I've never encountered tooling that forces you to have these by default. If it's enforced, it's rules defined in your project or some unusual tools
What do you mean by index files? It might depend on the bundler, but I haven’t heard of index.js/index.ts files being a hard requirement for a directory to be traversable in most tooling.
> JavaScript tooling requires index files for everything
You mean barrel files? Those are horrible kludges used by lazy people to pretend they're hiding implementation details, generate arbitrary accidental circular imports, and end up causing absolute hell if you're using any sort of naive transpiling/bundling tooling/plugin/adapter.
Fie on "Implicit namespace package". If only because making "implicit" explicit is linguistically pointless in that 3-word phrase.
Either "namespace" or "package" is also pointless linguistically. Noun-noun names ("namespace package") in programming are always a smell. Meh, it's a job career that pays the bills rent.
Maybe "namespace" (no dunder init) vs "package" (dunder init) would have saved countless person-years of confusion? Packages and "implicit namespace packages" are not substitutes for one another (fscking parent relative imports!) so there's no reason they need the same nouns.
>If only because making "implicit" explicit is linguistically pointless in that 3-word phrase.
"Implicit" isn't part of the formal terminology PEP 420 introduces. It's just in the title and some other passing descriptive mentions. (The PEP author has posted ITT, so you could probably ask for details.)
>Either "namespace" or "package" is also pointless linguistically.
"Namespace package" distinguishes from "regular package". The two words are not at all synonyms. In Python, "namespace" could also plausibly refer to the set of attributes of an object (actually how package namespacing is implemented: the package is a `module` object, and its contained modules and subpackages are attributes), keys of a dictionary, or names in a variable scope (e.g. "the global namespace" - which, in turn, gets reflected as a dictionary by `globals()`). Meanwhile, a package in a running program is about more than the namespacing it provides: it has additional magic like `__path__`. And in a broader development context, "package" could refer to a distribution package you get from PyPI, which might contain the code for zero or more "import packages" (yes, that is also quasi-standard terminology: https://packaging.python.org/en/latest/discussions/distribut...).
>Packages and "implicit namespace packages" are not substitutes for one another (fscking parent relative imports!)
Yes, they are. Both are modeled with objects of the same type in Python, created following the same rules. The absence of `__init__.py` is not why your relative imports fail. They fail because the parent package hasn't been loaded (and thus its `__path__` can't be consulted), which happens because:
1. you've tried to run the child directly, rather than entering the package via an absolute import (from a driver script or by using `-m` - see https://stackoverflow.com/questions/11536764); or
2. you're expecting the leading `.`s in a relative import to ascend through the file hierarchy, but it doesn't work that way - they ascend through the loaded package hierarchy (https://stackoverflow.com/questions/30669474).
(The SO references are admittedly not great - they're full of bad answers from people who didn't properly understand the topic but managed to get something working. Hopefully I'll have much better Q&A about this topic up on Codidact eventually.)
I do relative imports without `__init__.py` all the time. Here's a demo:
$ mkdir package
$ mkdir package/subpackage
$ cat > package/parent.py
print("hello from parent")
$ cat > package/subpackage/child.py
from .. import parent
print("hello from child")
$ python package/subpackage/child.py
[traceback omitted]
ImportError: attempted relative import with no known parent package
$ python -m package.subpackage.child
hello from parent
hello from child
$ cd package/
$ python -m subpackage.child
[traceback omitted]
ImportError: attempted relative import beyond top-level package
>Paraphrasing, "That's not the name it's just the title and used repeatedly therein" seems to cause more than a little confusion.
The phrase "implicit namespace packages" is only used once within the prose of the PEP. But also, the title of the PEP is certainly a separate thing from the name of the feature.
Similarly, nobody says that a project following modern packaging standards is using "A build-system independent format for source trees" (which would make it sound as if there were more than one relevant such format), the title of PEP-517. Instead they say that it's a `pyproject.toml`-based project.
>The extensive response confirms that the words are awfully overloaded in subtle ways,
I agree, basically. This happens all the time in programming, of course. "Package" in the Python ecosystem is perhaps not as bad as, say, `static` in the C++ language; but it's bad and I really wish there were a reasonable way to fix it.
On the other hand, "namespace" here isn't meant as Python-specific jargon. It isn't really meant that way anywhere else, either (e.g. people saying "global namespace" should normally really be saying "global scope"). It's the language of computer science, in the abstract (https://en.wikipedia.org/wiki/Namespace). So of course it ends up referring to all kinds of things (in multiple categories: data types, objects which are instances of those data types, file systems...) which implement the concept of namespacing.
>But, rest assured, I will re-encounter it.
Whenever I browse HN I mainly look for posts about Python specifically; so if for example you ever have an Ask HN about it there's a good chance I can help.
Maybe Pythons import/module system makes sense to someone but its caused me a lot of trouble. I'm not even sure exactly why but I run into weird issues often.
Like the fact that the imports are tied to the working directory things are expected to be run from is just very weird. There should just be one project root and all imports can then import via full paths from the root or paths relative to the file's location. Tying it to working directory is incredibly confusing. The same imports can work when run from one directory and fail from another..
In a project I have I need to run it like `python3 -m part2.week2` instead of `python3 part2/week2.py` or the imports will fail. I'll probably never understand this system :)
Part of it might just be some implicit complexity in module systems as a whole that I don't get, not just python. Rust has some confusing situations too, although Java and Go seem to have more straightforward systems.
The module search path (sys.path) is the thing you need to know about.
With `python -m part2.week2`, by default `.` is added at the start.
With `python part2/week2.py`, `part2` (the script directory) is added at the start (instead of `.`).
So in the former case, `week2.py` is able to do something like `from part2 import ...` but not in the latter case.
But this should work `PYTHONPATH=. python part2/week2.py`.
https://docs.python.org/3/tutorial/modules.html#the-module-s...
https://docs.python.org/3/library/sys_path_init.html#sys-pat...
>Part of it might just be some implicit complexity in module systems as a whole that I don't get, not just python.
I won't speak about other languages, but Python's system is indeed very complex - you just aren't usually exposed to most of the complexity, because it mainly takes the form of optional hooks. (Thanks for the reminder that I need to do a blog post about this some time - once I've figured out how to structure everything I want to say.)
js2 gave you the how-to; so here's the detailed explanation. You might not want to try to tackle it all at once.
Importantly, Python modules don't necessarily map one-to-one to `.py` files. They can be `import`ed from a precompiled `.pyc` bytecode file, which could be either in the searched folder directly or in a `__pycache__` subdirectory (the modern default). Or they can be loaded from a `.py` file within an archive, where `sys.path` contains path and filename of the archive rather than just a folder path. Or they can come from within the Python interpreter executable itself - and that's not just for the builtins. Or they can be anything else that ends up creating a `module` object, as long as you write the necessary custom importer.
Aside from the well-known `sys.path`, there is a `sys.meta_path` which contains objects that represent ways to import modules:
(IIRC: the `PathFinder` delegates to a separate loader which may be either a .py file loader, .pyc file loader, .zip archive loader etc. The `FrozenImporter` is for the non-"builtin" modules within the interpreter.)>the fact that the imports are tied to the working directory things are expected to be run from
It doesn't actually work that way. It is tied to sys.path - which, by default, is initialized to include the working directory or an appropriate equivalent. See https://docs.python.org/3/library/sys_path_init.html for details.
>There should just be one project root
And what about when you want to import things that aren't within the project? How will Python know whether it should look within the project or somewhere else?
>and all imports can then import via full paths from the root or paths relative to the file's location.
Aside from the fact that "path" might not be as meaningful as you expect for every import, this is inconvenient for people who like src layout. The point of the dotted-path notation is that you get to use a symbolic name for the module which reflects the logical rather than physical structure of your project. Plus, you get to use relative imports without being limited by the clunky relative-path syntax. (It's important not to have that limitation, because any given package can be split across different folders in unrelated locations. In fact, that's the precise purpose of the "namespace package" feature discussed in TFA, and in more detail in the PEP (https://peps.python.org/pep-0420/ .)
If you really mean that any valid file path should work, then the "one project root" no longer has meaning, and also it sounds like you're now on the hook for knowing the absolute path to the standard library (and to the `site-packages` directory for third-party libraries). As I'm sure you can imagine, that's not great for portability.
Many people seem to think they want an `import` statement that uses file paths instead. Generally they don't understand what they'd be giving up.
If you want it to work both ways, then you have to define the package structure semantics when a path is used, so that modules imported both ways can properly interoperate. It's not a simple task with a clear and unambiguous specification - if it were, someone would have implemented it already.
It is possible to import "dynamically" by directly telling the import system, at runtime, what file to use (you can tell it what loader to use, too). See for example https://stackoverflow.com/questions/67631.
>In a project I have I need to run it like `python3 -m part2.week2` instead of `python3 part2/week2.py` or the imports will fail. I'll probably never understand this system :)
There are many details in the system, but the top-level understanding is quite simple. `-m` means to run the code as a module, i.e., to figure out what package it's in (by importing its ancestors first) and to set its `__name__` to the actual name of the module (so that code inside `if __name__ == '__main__':` will not run). Since you're specifying a qualified module name (i.e., the package is a namespace), you use the dot notation. Since you're not specifying a path, Python uses its own internal logic to decide where the code is and what form it takes. Since you're "importing" under the hood, Python will leave a bytecode cache behind by default.
Giving a path to a file means to run the code as a script - i.e., Python still does the usual file I/O, bytecode compilation if not cached, and execution of the top-level code; but it sets `__name__` to the special value `'__main__'` (so that "guarded" code does run), and it doesn't import any parent packages (because there aren't any) and thus relative imports can't work. Since you're specifying a path, you're directly telling Python where the code file is; Python only uses its internal logic to determine how to load it. Since this file is a "script" that represents the conceptual top level of whatever you're doing, it's not treated as an import, and so Python doesn't create a bytecode cache.
> -m` means to run the code as a module, i.e., to figure out what package it's in (by importing its ancestors first) and to set its `__name__` to the actual name of the module (so that code inside `if __name__ == '__main__':` will not run).
This is only partially true. The code will be executed as a module in some senses, but importantly `__name__` will be set to `__main__` for this module. This is what enables e.g. `python -m http.server` to have a different effect to `import http.server`.
I also disagree with some other parts of this as well. For example, if you were okay with breaking changes, it's quite easy to differentiate between local imports (any import path that begins with `.`) and imports from the package store (all other imports). This is how imports are generally handled in Javascript, for example. Looking for the package root also doesn't have to be that complicated if you have a well-known project structure or a well-known metadata file that can specify what the project should look like. Again, look at Javascript and the `package.json` file, where all of the module entry points can be defined explicitly.
I understand why Python doesn't work like this, and why it probably never can at this point, but that doesn't make it a good system, and it doesn't mean that the quirks are there for a good reason. IIRC, there's some quite clever logic built into the import system just so that you can import a third-party Flask plugin from `flask.plugins.*`, which is cool but also completely unnecessary complexity that makes the whole system more difficult to comprehend for almost no benefit.
If there were ever to be a Python 4, I wish it would focus solely on the necessary breaking changes to get Python packaging back into a manageable state: explicit and mandatory metadata, one fully-featured blessed package manager (and not the anemic, it-just-installs-things approach of pip, but a genuine package manager with lock files, workspaces, proper dependency groups, etc), and clear, obvious imports that can be explained to an everyday user of Python without them having to resort to badly-curated SO answers.
But unfortunately I think that ship has sailed.
Thanks for the informal invitation to write even more on one of my favourite topics. ;)
>but importantly `__name__` will be set to `__main__` for this module
Correct. Should have tested that, sorry - I don't use the feature often.
>It's quite easy to differentiate between local imports (any import path that begins with `.`) and imports from the package store (all other imports). This is how imports are generally handled in Javascript, for example.
JavaScript generally runs in a fundamentally different environment, and it has the luxury of letting the environment define these semantics (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...). But the problem I was trying to describe was more of a design problem. As far as I know, JavaScript doesn't have to worry about the hierarchical package structure thing. When you use an import for a name like `foo.bar.baz`, it's easy to see the package structure. If you import a path like, I don't know, `file://./foo/bar/baz.py`, you have to decide whether `foo` and `bar` are packages or just containing folders (the problem that the original article is about). You also have to decide whether that's relative to the current file or the project root (hopefully not the current working directory!) You have to decide whether you're forbidding absolute paths for "security" or allowing them for "convenience", and you realistically need to support support the symbolic-name syntax as well anyway. And you either restrict the user to know the actual file type source, or have to come up with some pseudo-path thing that allows for discovering other sources.
>I understand why Python doesn't work like this, and why it probably never can at this point, but that doesn't make it a good system
There are definitely things I don't like about this system, but it demonstrably solves the important problems, and can be worked with more straightforwardly (and understood more readily) than a lot of users seem to think. There's a lot of history to this - see e.g. https://stackoverflow.com/a/75068706/523612 - which indeed has a lot to do with "why it probably never can", but I think a bigger issue is that Python doesn't actually expect users to create a "project", and thus there isn't necessarily a good place to put a marker file. Also, "project root" and "package root" are different concepts, especially if you use src layout (as I do nowadays).
>If there were ever to be a Python 4, I wish it would focus solely on the necessary breaking changes to get Python packaging back into a manageable state
I used to think along these lines, too. But nowadays, the idea of a greenfield, Python-inspired language project makes a lot more sense to me. I expect to be blogging a fair bit about my own design in 2025.
That said, it seems pretty clear that the Python community sees the language and the packaging ecosystem as more or less separate concerns that can be worked on separately. (Notwithstanding the special status of Pip - whereby it's still distributed with Python, bootstrapped by other standard library modules and commonly expected to be copied into every environment, even though it's separately developed and versioned, doesn't have a formal programmatic API and is perfectly capable of installing into environments other than its host.) The current attitude is still that there will never be a Python 4 - "Python 3 is the brand", PEP 2026 proposes dropping the pretense and making it use real calver, etc. But I don't think that means they'll never drop support for legacy builds. (Of course, they'll never drop support for `setup.py` - because that's Setuptools' responsibility now; PEP 517 allows the build backend to make its own design decisions, and building non-Python code and integrating it with Python is not at all straightforward. But they can eventually require `pyproject.toml`.)
>explicit and mandatory metadata
Yes, this is embarrassing. They've supposedly been working on it, and I've been following the conversation in some detail. But the community's over-accommodating attitude towards backwards compatibility is a serious handicap here. I have a special "celebration" planned for the official Pip 24.3 release, which should be soon (apparently .dev0 is available).
>one fully-featured blessed package manager (and not the anemic, it-just-installs-things approach of pip, but a genuine package manager...
A lot of people say this and I haven't understood it. To me, it doesn't make a difference whether the various tasks a "package manager" does are separate applications or separate subcommands of the same application. The "just-installs-packages" thing is actually pretty complex already (it still includes dependency resolution, for example, and at least in Python it includes orchestrating local builds). And for the rest of the packaging loop, there is poor agreement on what the expected set of tasks is. The competing third-party options are competing for a reason.
I have a lot more thoughts about this topic - not yet very organized ones.
>clear, obvious imports that can be explained to an everyday user of Python without them having to resort to badly-curated SO answers.
Honestly, I blame SO for this (saying this as an SO curator with a special interest in the topic). The old Q&A is full of randomly-scoped Qs that covered the individual needs of some random individuals, rather than coherently dividing up the problem space. The answers are very often just bad despite huge upvote totals - partly because things have changed over time, partly because some myths are extremely persistent, and partly because everyone wants to share their personal experiences.
At the time a lot of this content was being written, SO hadn't really fully established its "sense of self" yet - there doesn't appear to have been a core community on the meta site that really understood what the format is capable of, what Stack Exchange sites ought to be, or how to achieve those goals. (Try digging into the history of how the standard reasons for closing questions have changed over time.) It took until 2014 or so, when new question volume was at its peak, for people to get frustrated enough to actually start defining those things properly.
Anyway, people still ask lots of "new" questions about Python that should have a straightforward answer and be recognized as common duplicates - but it's hell finding proper duplicate targets for them. There is maddeningly little community will to do anything about it, even though the problem is well recognized (https://meta.stackoverflow.com/questions/423857). It's a big part of why I'm trying to start over on Codidact.
> Python's system is indeed very complex - you just aren't usually exposed to most of the complexity
Indeed, I've been using Python since 1.5.2 as pretty much my daily language (I code in a lot of other languages too, but Python in my strong suit) and I've never noticed sys.meta_path despite it having been there for over 20 years:
https://peps.python.org/pep-0302/
__init__.py isn't exactly optional. The article sort of covers it, but if you don't include it, you are creating a namespace package, which is not the same as a regular package.
As an example, see
https://pypi.org/project/mir.msmtpq/
https://pypi.org/project/mir.sitemap/
These use the `mir` namespace package to group separate packages. It's quite helpful as Python does not otherwise provide namespaces for the import namespace like Java or Go do.
But likewise, you don't want to do that for regular packages as you're opening yourself up to getting clobbered by a similarly named package.
It's somewhat comparable to PHONY in Makefiles. You could omit them. But you really shouldn't.
>But likewise, you don't want to do that for regular packages as you're opening yourself up to getting clobbered by a similarly named package.
(I'm guessing the example is your own code? Also, I guess you meant "regular" in the ordinary English sense, but in the terminology of PEP 420, a "regular package" has `__init__.py` files by definition.)
In principle, yes - you could get clobbered by such a package, in a different location on the system, that nevertheless uses the same name and also happens to be in `sys.path` for some reason. In principle, `__init__.py` files protect against this.
But in practice, you would get clobbered anyway, because of how Pip works:
Only one `mir` folder is created, and Pip blindly writes files from both wheels into that folder. Although namespace packages enable Python to split the package across separate folder hierarchies, that functionality isn't actually depended on. I can create and install my own conflicting wheel, and it will happily overwrite (by default! Without warning!) existing files, add `__init__.py` to the base `mir` folder, etc.> Now, it’s immediately clear that component_a and component_b are packages, and services is just a directory. It also makes clear that “scripts” isn’t a package at all, and my_script.py isn’t something you should be importing. __init__.py files help developers understand the structure of your codebase.
The problem with this theory is that the presence of an __init__.py in a new project I've never seen before doesn't necessarily mean anything—it could just have been sprinkled superstitiously because no one really understands what exactly these files are for.
Then later—once I'm familiar enough with the project to know that the developers are using the marker file strictly to indicate modules—I'm already familiar with the directory structure and the marker files aren't doing anything for me.
I'd rather see people in the situation above make the module structure clear in the actual folder hierarchy and naming decisions than have them sprinkle __init__.py and count on readers to understand they were being intentional and not superstitious.
>The problem with this theory is that the presence of an __init__.py in a new project I've never seen before doesn't necessarily mean anything—it could just have been sprinkled superstitiously because no one really understands what exactly these files are for.
This wouldn't be a problem, and `__init__.py` files would unambiguously have the intended meaning, if cargo-cult programmers didn't get to overwhelm the explanation of the import system in early Stack Overflow questions on the topic (and create a huge mess of awkwardly-overlapping Q&As that are popular but don't work well as duplicate targets).
Just the title of this article conveys information that many Python programmers seem completely unaware of. The overwhelming majority of recommendations for `sys.path` hacks are completely unnecessary. Major real-world projects like Tensorflow can span hundreds of kloc without using them in the main codebase (you might see one or two uses to help Pytest or Sphinx along).
>I'd rather see people in the situation above make the module structure clear in the actual folder hierarchy and naming decisions
There are many ways to do this, and people like having a standard. Also, as TFA explains, tooling can automatically detect the presence of `__init__.py` files and doesn't have to assign any special meaning to your naming decisions.
I hate init py files the same way I hate Java’s verbosity.
If it looks like a package, it’s a package.
If I want to be explicit, I’ll write something in C.
There's nothing better than verbosity on large projects, especially in rarely-touched areas (build/test/deploy infra).
The perfect anti-pattern example is sbt. It shortened commands to unintelligible codes, even though you need to write it maybe once per year.
And by the way, you never write full lines anyway, autocomplete knows what to write using just 2-3 keystrokes, even without any AI. So verbosity doesn't affect writing speed anyway.
> If I want to be explicit, I’ll write something in C.
Absolutely love those explicit void pointers.
The problem is that packages can also look very different. For example, Python can import an entire package hierarchy from a zip file. (This is essential to how `zipapp` works; and under limited circumstances, a wheel file can also be used this way - which plays an essential role in how Pip is bootstrapped.)
> For example, Python can import an entire package hierarchy from a zip file.
As could Python's contemporaries back in the day, JAR extensions notwithstanding. That fact does not justify bad design.
__init__.py distinguishes (for the import machinery as well as human readers) standard packages from implicit namespace packages, both of which "look like" (and are!) packages.
Fwiw, I never use them in Python if I don't need to put anything inside them. They're optional for a reason.
They are "optional" because support was added for implicit namespace packages, but those have different semantics (particularly, placing another thing that resolves as a namespace package with the same name elsewhere on sys.path adds to the same package; as a result, the import machinery can't stop scanning sys.path when it finds a namespace package, it has to continue scanninng to see if there are more parts to add or if there is a standard package that takes precedence, so leaving off __init__.py for what is intended as a standard package, as well as opening up potentially incorrect behavior, adds a startup cost.)
From the zen of python: ‘Explicit is better than implicit.’
duck typing has entered the room
Also treating non-boolean objects with a boolean meaning. Python is fine and all, but the Zen of Python has always been a joke that the language doesn't bother to follow.
It's called "the Zen of Python", and not e.g. "the laws of Python", for a reason. Not everyone is, as Tim Peters put it, Dutch.
(Although I should also say: PEP 20 reflects a 20-year-old understanding of Python's design paradigm and of the lessons learned from developing it, and Tim Peters was highly deferential to Guido van Rossum when publishing it. In fact, GvR was invited to add the supposedly missing 20th principle, but never did - IMHO, this refusal is very much in the Zen spirit. Anyway, the language - and standard library - have changed a lot since then.)
> It's called "the Zen of Python", and not e.g. "the laws of Python"
Agreed. The pythonic zen is an ideal not always attained.
Thr post is overrated. The files are optional. I use them only if I need to put something inside them. As for imports, they work fine. And as for static analyzers, they work for me, as in I don't work for them.
Fundamentally it's an argument about style, which tries to make some default assumptions about a programmer's needs - so of course some people won't find it very convincing. FWIW, I'm in your camp for the most part, but I do very much appreciate the other view.
Per the example (the services/component_b/child example) in the article, are you supposed to put __init__.py in the sub-folder of a module too ?
I'm asking because I rarely see this be done (nor have I).
Yes, it goes in every "regular package" folder (a folder without one creates a "namespace package" - or rather, a "portion" thereof; this allows for a parallel file hierarchy somewhere else that holds another part of the same package).
Python represents packages as `module` objects - the exact same type, not even a subclass. These are created to represent either the folder or an `__init__.py` file.
The file need not be empty and in fact works like any other module, with the exception of some special meaning given to certain attributes (in particular, `__all__`, which controls the behaviour of star-imports). Notably, you can use this to force modules and subpackages within the package to load when the package is imported, even if the user doesn't request them (`collections.abc` does this); and you can make some other module appear to be within the package by assigning it as an attribute (`os.path` works this way; when `os` is imported, some implementation module such as `ntpath` or `posixpath` is chosen and assigned).
Yes, any folder that contains files that are expected to be imported (or other folders that have the same)
I have been writing python for decades and had no idea they were made optional. Amazing.
I don't know why didn't PEP 420 make one file at root of the package that describe the structure of all the sub-directories, rather than going the controversial, lazy route to remove __init__.py. That way you get both the explicitness and avoid the litering of empty marking files.
This is spelled out in the PEP (I’m the author). There isn’t one “portion” (a term defined in the PEP) to own that file. And it’s also not a requirement that all portions be installed in the same directory. I’m sorry you feel this was lazy, it was a lot of work spanning multiple attempts over multiple years.
I appreciate your work on pep 420; I’ve benefited from it personally as a Python user. Thank you for a job well done.
Thank you for the kind words.
JavaScript tooling requires index files for everything, which makes development slow, particularly when you want to iterate fast or create many files with single output.
I think it makes sense to make the compiler or script loader to rely on just the file and their contents. Either way you're already defining everything, why create an additional redundant set of definitions.
> JavaScript tooling requires index files for everything
This just isn't true. I've never encountered tooling that forces you to have these by default. If it's enforced, it's rules defined in your project or some unusual tools
What do you mean by index files? It might depend on the bundler, but I haven’t heard of index.js/index.ts files being a hard requirement for a directory to be traversable in most tooling.
> JavaScript tooling requires index files for everything
You mean barrel files? Those are horrible kludges used by lazy people to pretend they're hiding implementation details, generate arbitrary accidental circular imports, and end up causing absolute hell if you're using any sort of naive transpiling/bundling tooling/plugin/adapter.
They aren't necessarily empty.
then why was it made optional? implicit is better than explicit?
Because sometimes implicit is best, other times explicit is best. Good to have options as different problems require different solutions.
Java does this way better.
Agreed.
Fie on "Implicit namespace package". If only because making "implicit" explicit is linguistically pointless in that 3-word phrase.
Either "namespace" or "package" is also pointless linguistically. Noun-noun names ("namespace package") in programming are always a smell. Meh, it's a job career that pays the bills rent.
Maybe "namespace" (no dunder init) vs "package" (dunder init) would have saved countless person-years of confusion? Packages and "implicit namespace packages" are not substitutes for one another (fscking parent relative imports!) so there's no reason they need the same nouns.
>If only because making "implicit" explicit is linguistically pointless in that 3-word phrase.
"Implicit" isn't part of the formal terminology PEP 420 introduces. It's just in the title and some other passing descriptive mentions. (The PEP author has posted ITT, so you could probably ask for details.)
>Either "namespace" or "package" is also pointless linguistically.
"Namespace package" distinguishes from "regular package". The two words are not at all synonyms. In Python, "namespace" could also plausibly refer to the set of attributes of an object (actually how package namespacing is implemented: the package is a `module` object, and its contained modules and subpackages are attributes), keys of a dictionary, or names in a variable scope (e.g. "the global namespace" - which, in turn, gets reflected as a dictionary by `globals()`). Meanwhile, a package in a running program is about more than the namespacing it provides: it has additional magic like `__path__`. And in a broader development context, "package" could refer to a distribution package you get from PyPI, which might contain the code for zero or more "import packages" (yes, that is also quasi-standard terminology: https://packaging.python.org/en/latest/discussions/distribut...).
>Packages and "implicit namespace packages" are not substitutes for one another (fscking parent relative imports!)
Yes, they are. Both are modeled with objects of the same type in Python, created following the same rules. The absence of `__init__.py` is not why your relative imports fail. They fail because the parent package hasn't been loaded (and thus its `__path__` can't be consulted), which happens because:
1. you've tried to run the child directly, rather than entering the package via an absolute import (from a driver script or by using `-m` - see https://stackoverflow.com/questions/11536764); or
2. you're expecting the leading `.`s in a relative import to ascend through the file hierarchy, but it doesn't work that way - they ascend through the loaded package hierarchy (https://stackoverflow.com/questions/30669474).
(The SO references are admittedly not great - they're full of bad answers from people who didn't properly understand the topic but managed to get something working. Hopefully I'll have much better Q&A about this topic up on Codidact eventually.)
I do relative imports without `__init__.py` all the time. Here's a demo:
Adding `__init__.py` files has no effect on this.Thank you for the extensive response.
Paraphrasing, "That's not the name it's just the title and used repeatedly therein" seems to cause more than a little confusion.
The extensive response confirms that the words are awfully overloaded in subtle ways, and that using "namespace" as an adjective was a poor choice.
As for your example, thank you. I forget the specific edge case that trips me up here. But, rest assured, I will re-encounter it.
>Paraphrasing, "That's not the name it's just the title and used repeatedly therein" seems to cause more than a little confusion.
The phrase "implicit namespace packages" is only used once within the prose of the PEP. But also, the title of the PEP is certainly a separate thing from the name of the feature.
Similarly, nobody says that a project following modern packaging standards is using "A build-system independent format for source trees" (which would make it sound as if there were more than one relevant such format), the title of PEP-517. Instead they say that it's a `pyproject.toml`-based project.
>The extensive response confirms that the words are awfully overloaded in subtle ways,
I agree, basically. This happens all the time in programming, of course. "Package" in the Python ecosystem is perhaps not as bad as, say, `static` in the C++ language; but it's bad and I really wish there were a reasonable way to fix it.
On the other hand, "namespace" here isn't meant as Python-specific jargon. It isn't really meant that way anywhere else, either (e.g. people saying "global namespace" should normally really be saying "global scope"). It's the language of computer science, in the abstract (https://en.wikipedia.org/wiki/Namespace). So of course it ends up referring to all kinds of things (in multiple categories: data types, objects which are instances of those data types, file systems...) which implement the concept of namespacing.
>But, rest assured, I will re-encounter it.
Whenever I browse HN I mainly look for posts about Python specifically; so if for example you ever have an Ask HN about it there's a good chance I can help.
Thank you.