zaptheimpaler 8 hours ago

Maybe Pythons import/module system makes sense to someone but its caused me a lot of trouble. I'm not even sure exactly why but I run into weird issues often.

Like the fact that the imports are tied to the working directory things are expected to be run from is just very weird. There should just be one project root and all imports can then import via full paths from the root or paths relative to the file's location. Tying it to working directory is incredibly confusing. The same imports can work when run from one directory and fail from another..

In a project I have I need to run it like `python3 -m part2.week2` instead of `python3 part2/week2.py` or the imports will fail. I'll probably never understand this system :)

Part of it might just be some implicit complexity in module systems as a whole that I don't get, not just python. Rust has some confusing situations too, although Java and Go seem to have more straightforward systems.

  • js2 8 hours ago

    The module search path (sys.path) is the thing you need to know about.

    With `python -m part2.week2`, by default `.` is added at the start.

    With `python part2/week2.py`, `part2` (the script directory) is added at the start (instead of `.`).

    So in the former case, `week2.py` is able to do something like `from part2 import ...` but not in the latter case.

    But this should work `PYTHONPATH=. python part2/week2.py`.

    https://docs.python.org/3/tutorial/modules.html#the-module-s...

    https://docs.python.org/3/library/sys_path_init.html#sys-pat...

  • zahlman 7 hours ago

    >Part of it might just be some implicit complexity in module systems as a whole that I don't get, not just python.

    I won't speak about other languages, but Python's system is indeed very complex - you just aren't usually exposed to most of the complexity, because it mainly takes the form of optional hooks. (Thanks for the reminder that I need to do a blog post about this some time - once I've figured out how to structure everything I want to say.)

    js2 gave you the how-to; so here's the detailed explanation. You might not want to try to tackle it all at once.

    Importantly, Python modules don't necessarily map one-to-one to `.py` files. They can be `import`ed from a precompiled `.pyc` bytecode file, which could be either in the searched folder directly or in a `__pycache__` subdirectory (the modern default). Or they can be loaded from a `.py` file within an archive, where `sys.path` contains path and filename of the archive rather than just a folder path. Or they can come from within the Python interpreter executable itself - and that's not just for the builtins. Or they can be anything else that ends up creating a `module` object, as long as you write the necessary custom importer.

    Aside from the well-known `sys.path`, there is a `sys.meta_path` which contains objects that represent ways to import modules:

        Python 3.12.4 (main, Jun 24 2024, 03:28:13) [GCC 11.4.0] on linux
        Type "help", "copyright", "credits" or "license" for more information.
        >>> import sys
        >>> sys.meta_path
        [<class '_frozen_importlib.BuiltinImporter'>, <class '_frozen_importlib.FrozenImporter'>, <class '_frozen_importlib_external.PathFinder'>]
    
    (IIRC: the `PathFinder` delegates to a separate loader which may be either a .py file loader, .pyc file loader, .zip archive loader etc. The `FrozenImporter` is for the non-"builtin" modules within the interpreter.)

    >the fact that the imports are tied to the working directory things are expected to be run from

    It doesn't actually work that way. It is tied to sys.path - which, by default, is initialized to include the working directory or an appropriate equivalent. See https://docs.python.org/3/library/sys_path_init.html for details.

    >There should just be one project root

    And what about when you want to import things that aren't within the project? How will Python know whether it should look within the project or somewhere else?

    >and all imports can then import via full paths from the root or paths relative to the file's location.

    Aside from the fact that "path" might not be as meaningful as you expect for every import, this is inconvenient for people who like src layout. The point of the dotted-path notation is that you get to use a symbolic name for the module which reflects the logical rather than physical structure of your project. Plus, you get to use relative imports without being limited by the clunky relative-path syntax. (It's important not to have that limitation, because any given package can be split across different folders in unrelated locations. In fact, that's the precise purpose of the "namespace package" feature discussed in TFA, and in more detail in the PEP (https://peps.python.org/pep-0420/ .)

    If you really mean that any valid file path should work, then the "one project root" no longer has meaning, and also it sounds like you're now on the hook for knowing the absolute path to the standard library (and to the `site-packages` directory for third-party libraries). As I'm sure you can imagine, that's not great for portability.

    Many people seem to think they want an `import` statement that uses file paths instead. Generally they don't understand what they'd be giving up.

    If you want it to work both ways, then you have to define the package structure semantics when a path is used, so that modules imported both ways can properly interoperate. It's not a simple task with a clear and unambiguous specification - if it were, someone would have implemented it already.

    It is possible to import "dynamically" by directly telling the import system, at runtime, what file to use (you can tell it what loader to use, too). See for example https://stackoverflow.com/questions/67631.

    >In a project I have I need to run it like `python3 -m part2.week2` instead of `python3 part2/week2.py` or the imports will fail. I'll probably never understand this system :)

    There are many details in the system, but the top-level understanding is quite simple. `-m` means to run the code as a module, i.e., to figure out what package it's in (by importing its ancestors first) and to set its `__name__` to the actual name of the module (so that code inside `if __name__ == '__main__':` will not run). Since you're specifying a qualified module name (i.e., the package is a namespace), you use the dot notation. Since you're not specifying a path, Python uses its own internal logic to decide where the code is and what form it takes. Since you're "importing" under the hood, Python will leave a bytecode cache behind by default.

    Giving a path to a file means to run the code as a script - i.e., Python still does the usual file I/O, bytecode compilation if not cached, and execution of the top-level code; but it sets `__name__` to the special value `'__main__'` (so that "guarded" code does run), and it doesn't import any parent packages (because there aren't any) and thus relative imports can't work. Since you're specifying a path, you're directly telling Python where the code file is; Python only uses its internal logic to determine how to load it. Since this file is a "script" that represents the conceptual top level of whatever you're doing, it's not treated as an import, and so Python doesn't create a bytecode cache.

    • MrJohz 4 hours ago

      > -m` means to run the code as a module, i.e., to figure out what package it's in (by importing its ancestors first) and to set its `__name__` to the actual name of the module (so that code inside `if __name__ == '__main__':` will not run).

      This is only partially true. The code will be executed as a module in some senses, but importantly `__name__` will be set to `__main__` for this module. This is what enables e.g. `python -m http.server` to have a different effect to `import http.server`.

      I also disagree with some other parts of this as well. For example, if you were okay with breaking changes, it's quite easy to differentiate between local imports (any import path that begins with `.`) and imports from the package store (all other imports). This is how imports are generally handled in Javascript, for example. Looking for the package root also doesn't have to be that complicated if you have a well-known project structure or a well-known metadata file that can specify what the project should look like. Again, look at Javascript and the `package.json` file, where all of the module entry points can be defined explicitly.

      I understand why Python doesn't work like this, and why it probably never can at this point, but that doesn't make it a good system, and it doesn't mean that the quirks are there for a good reason. IIRC, there's some quite clever logic built into the import system just so that you can import a third-party Flask plugin from `flask.plugins.*`, which is cool but also completely unnecessary complexity that makes the whole system more difficult to comprehend for almost no benefit.

      If there were ever to be a Python 4, I wish it would focus solely on the necessary breaking changes to get Python packaging back into a manageable state: explicit and mandatory metadata, one fully-featured blessed package manager (and not the anemic, it-just-installs-things approach of pip, but a genuine package manager with lock files, workspaces, proper dependency groups, etc), and clear, obvious imports that can be explained to an everyday user of Python without them having to resort to badly-curated SO answers.

      But unfortunately I think that ship has sailed.

Ferret7446 3 hours ago

__init__.py isn't exactly optional. The article sort of covers it, but if you don't include it, you are creating a namespace package, which is not the same as a regular package.

As an example, see

https://pypi.org/project/mir.msmtpq/

https://pypi.org/project/mir.sitemap/

These use the `mir` namespace package to group separate packages. It's quite helpful as Python does not otherwise provide namespaces for the import namespace like Java or Go do.

But likewise, you don't want to do that for regular packages as you're opening yourself up to getting clobbered by a similarly named package.

It's somewhat comparable to PHONY in Makefiles. You could omit them. But you really shouldn't.

lolinder 8 hours ago

> Now, it’s immediately clear that component_a and component_b are packages, and services is just a directory. It also makes clear that “scripts” isn’t a package at all, and my_script.py isn’t something you should be importing. __init__.py files help developers understand the structure of your codebase.

The problem with this theory is that the presence of an __init__.py in a new project I've never seen before doesn't necessarily mean anything—it could just have been sprinkled superstitiously because no one really understands what exactly these files are for.

Then later—once I'm familiar enough with the project to know that the developers are using the marker file strictly to indicate modules—I'm already familiar with the directory structure and the marker files aren't doing anything for me.

I'd rather see people in the situation above make the module structure clear in the actual folder hierarchy and naming decisions than have them sprinkle __init__.py and count on readers to understand they were being intentional and not superstitious.

  • zahlman 7 hours ago

    >The problem with this theory is that the presence of an __init__.py in a new project I've never seen before doesn't necessarily mean anything—it could just have been sprinkled superstitiously because no one really understands what exactly these files are for.

    This wouldn't be a problem, and `__init__.py` files would unambiguously have the intended meaning, if cargo-cult programmers didn't get to overwhelm the explanation of the import system in early Stack Overflow questions on the topic (and create a huge mess of awkwardly-overlapping Q&As that are popular but don't work well as duplicate targets).

    Just the title of this article conveys information that many Python programmers seem completely unaware of. The overwhelming majority of recommendations for `sys.path` hacks are completely unnecessary. Major real-world projects like Tensorflow can span hundreds of kloc without using them in the main codebase (you might see one or two uses to help Pytest or Sphinx along).

    >I'd rather see people in the situation above make the module structure clear in the actual folder hierarchy and naming decisions

    There are many ways to do this, and people like having a standard. Also, as TFA explains, tooling can automatically detect the presence of `__init__.py` files and doesn't have to assign any special meaning to your naming decisions.

e-dant 9 hours ago

I hate init py files the same way I hate Java’s verbosity.

If it looks like a package, it’s a package.

If I want to be explicit, I’ll write something in C.

  • deepsun 8 hours ago

    There's nothing better than verbosity on large projects, especially in rarely-touched areas (build/test/deploy infra).

    The perfect anti-pattern example is sbt. It shortened commands to unintelligible codes, even though you need to write it maybe once per year.

    And by the way, you never write full lines anyway, autocomplete knows what to write using just 2-3 keystrokes, even without any AI. So verbosity doesn't affect writing speed anyway.

  • throwup238 8 hours ago

    > If I want to be explicit, I’ll write something in C.

    Absolutely love those explicit void pointers.

  • dragonwriter 4 hours ago

    __init__.py distinguishes (for the import machinery as well as human readers) standard packages from implicit namespace packages, both of which "look like" (and are!) packages.

  • zahlman 7 hours ago

    The problem is that packages can also look very different. For example, Python can import an entire package hierarchy from a zip file. (This is essential to how `zipapp` works; and under limited circumstances, a wheel file can also be used this way - which plays an essential role in how Pip is bootstrapped.)

    • RhysU 7 hours ago

      > For example, Python can import an entire package hierarchy from a zip file.

      As could Python's contemporaries back in the day, JAR extensions notwithstanding. That fact does not justify bad design.

  • OutOfHere 8 hours ago

    Fwiw, I never use them in Python if I don't need to put anything inside them. They're optional for a reason.

    • dragonwriter 3 hours ago

      They are "optional" because support was added for implicit namespace packages, but those have different semantics (particularly, placing another thing that resolves as a namespace package with the same name elsewhere on sys.path adds to the same package; as a result, the import machinery can't stop scanning sys.path when it finds a namespace package, it has to continue scanninng to see if there are more parts to add or if there is a standard package that takes precedence, so leaving off __init__.py for what is intended as a standard package, as well as opening up potentially incorrect behavior, adds a startup cost.)

joshdavham 9 hours ago

From the zen of python: ‘Explicit is better than implicit.’

  • ranger_danger 8 hours ago

    duck typing has entered the room

    • bigstrat2003 8 hours ago

      Also treating non-boolean objects with a boolean meaning. Python is fine and all, but the Zen of Python has always been a joke that the language doesn't bother to follow.

      • zahlman 7 hours ago

        It's called "the Zen of Python", and not e.g. "the laws of Python", for a reason. Not everyone is, as Tim Peters put it, Dutch.

        (Although I should also say: PEP 20 reflects a 20-year-old understanding of Python's design paradigm and of the lessons learned from developing it, and Tim Peters was highly deferential to Guido van Rossum when publishing it. In fact, GvR was invited to add the supposedly missing 20th principle, but never did - IMHO, this refusal is very much in the Zen spirit. Anyway, the language - and standard library - have changed a lot since then.)

        • joshdavham 5 hours ago

          > It's called "the Zen of Python", and not e.g. "the laws of Python"

          Agreed. The pythonic zen is an ideal not always attained.

OutOfHere 8 hours ago

Thr post is overrated. The files are optional. I use them only if I need to put something inside them. As for imports, they work fine. And as for static analyzers, they work for me, as in I don't work for them.

  • zahlman 7 hours ago

    Fundamentally it's an argument about style, which tries to make some default assumptions about a programmer's needs - so of course some people won't find it very convincing. FWIW, I'm in your camp for the most part, but I do very much appreciate the other view.

thrdbndndn 8 hours ago

Per the example (the services/component_b/child example) in the article, are you supposed to put __init__.py in the sub-folder of a module too ?

I'm asking because I rarely see this be done (nor have I).

  • zahlman 7 hours ago

    Yes, it goes in every "regular package" folder (a folder without one creates a "namespace package" - or rather, a "portion" thereof; this allows for a parallel file hierarchy somewhere else that holds another part of the same package).

    Python represents packages as `module` objects - the exact same type, not even a subclass. These are created to represent either the folder or an `__init__.py` file.

    The file need not be empty and in fact works like any other module, with the exception of some special meaning given to certain attributes (in particular, `__all__`, which controls the behaviour of star-imports). Notably, you can use this to force modules and subpackages within the package to load when the package is imported, even if the user doesn't request them (`collections.abc` does this); and you can make some other module appear to be within the package by assigning it as an attribute (`os.path` works this way; when `os` is imported, some implementation module such as `ntpath` or `posixpath` is chosen and assigned).

  • bastawhiz 8 hours ago

    Yes, any folder that contains files that are expected to be imported (or other folders that have the same)

bagels 9 hours ago

I have been writing python for decades and had no idea they were made optional. Amazing.

CrendKing 9 hours ago

I don't know why didn't PEP 420 make one file at root of the package that describe the structure of all the sub-directories, rather than going the controversial, lazy route to remove __init__.py. That way you get both the explicitness and avoid the litering of empty marking files.

  • ericvsmith 9 hours ago

    This is spelled out in the PEP (I’m the author). There isn’t one “portion” (a term defined in the PEP) to own that file. And it’s also not a requirement that all portions be installed in the same directory. I’m sorry you feel this was lazy, it was a lot of work spanning multiple attempts over multiple years.

    • sevensor 8 hours ago

      I appreciate your work on pep 420; I’ve benefited from it personally as a Python user. Thank you for a job well done.

      • ericvsmith 7 hours ago

        Thank you for the kind words.

    • cyanydeez 8 hours ago

      JavaScript tooling requires index files for everything, which makes development slow, particularly when you want to iterate fast or create many files with single output.

      I think it makes sense to make the compiler or script loader to rely on just the file and their contents. Either way you're already defining everything, why create an additional redundant set of definitions.

      • bastawhiz 8 hours ago

        > JavaScript tooling requires index files for everything

        This just isn't true. I've never encountered tooling that forces you to have these by default. If it's enforced, it's rules defined in your project or some unusual tools

      • nophunphil 8 hours ago

        What do you mean by index files? It might depend on the bundler, but I haven’t heard of index.js/index.ts files being a hard requirement for a directory to be traversable in most tooling.

      • ElectricalUnion 8 hours ago

        > JavaScript tooling requires index files for everything

        You mean barrel files? Those are horrible kludges used by lazy people to pretend they're hiding implementation details, generate arbitrary accidental circular imports, and end up causing absolute hell if you're using any sort of naive transpiling/bundling tooling/plugin/adapter.

thelastparadise 9 hours ago

Java does this way better.

  • RhysU 8 hours ago

    Agreed.

    Fie on "Implicit namespace package". If only because making "implicit" explicit is linguistically pointless in that 3-word phrase.

    Either "namespace" or "package" is also pointless linguistically. Noun-noun names ("namespace package") in programming are always a smell. Meh, it's a job career that pays the bills rent.

    Maybe "namespace" (no dunder init) vs "package" (dunder init) would have saved countless person-years of confusion? Packages and "implicit namespace packages" are not substitutes for one another (fscking parent relative imports!) so there's no reason they need the same nouns.

cuteboy19 9 hours ago

then why was it made optional? implicit is better than explicit?

  • diggan 9 hours ago

    Because sometimes implicit is best, other times explicit is best. Good to have options as different problems require different solutions.