Wow this book is a goldmine for architecture patterns. I love how easy it is to get into a topic and quickly grasp it.
Having said that, from a practical and experience standpoint, using some of these patterns can really spiral out into an increased complexity and performance issues in Python, specially when you use already opinionated frameworks like Django which already uses the ActiveRecord pattern.
I’ve been in companies big and small using Python, both using and ignoring architectural patterns. Turns out all the big ones with strict architectural (n=3) pattern usage, although “clean”, the code is waaaay to complex and unnecessarily slow in tasks that at first glance should had been simple.
Whereas the big companies that didn’t care for these although the code was REALLY ugly in some places (huge if-else files/functions, huge Django models with all business logic implemented in them), I was most productive because although the code was ugly I could read it, understand it, and modify the 1000 lines of if-else statements.
Maybe this says something about me more than the code but I hate to admit I was more productive in the non clean code companies. And don’t get me started on the huge amount of discussions they avoided on what’s clean or not.
I think one of the biggest problems I encounter whenever I hear that a project follows strict architectural patterns essentially boils down to too many obfuscated abstractions that hide what is going on, or force you to jump through too many layers to accomplish tasks.
Many files/functions/classes need to be updated to accomplish even simple tasks because somebody made a decision that you aren't allowed to do X or Y thing without creating N other things.
But in those companies that didn't care about architectural patterns its very likely that while there was more ugly code in certain places, it resulted in code with less indirection and more contained to a single area/unit or the task at hand making it easier for people to jump in and understand. I see so many people who create function after function in file after file to abstract away functionality when I'd honestly rather have a 100 line function or method that I can easily jump around and edit/debug vs many tiny functions all in separate areas.
Not to say having some abstractions are bad but the more I work in this field the more I realize the less abstractions there are, the easier it is to reason about singular units/features in code. I've basically landed on just abstract away the really hard stuff, but stop abstracting out things that simple.
I've come to the similar conclusion - just write the damn logic inline, and only decouple the parts which would make the whole thing difficult to test. Test decoupled parts thoroughly but in isolation.
I found the book's use of modeling how to pilot an alien starship to be a little misleading, because a starship is a highly engineered product that functions in large part as a control mechanism for software. It comes with a clean design model already available for you to discover and copy.
Domain modeling should not be about copying the existing model -- it should be about improving on it using all the advantages software has over the physical and social technologies the new software product is meant to replace.
People are smart, and in most projects, there are key aspects of the existing domain model that are excellent abstractions that can and should be part of the new model. It's important to understand what stakeholders are trying to achieve with their current system before attempting to replace it.
But the models used in the business and cultural world are often messy, outdated and unoptimized for code. They rely on a human to interpret the edge cases and underspecified parts. We should treat that as inspiration, not the end goal.
> I found the book's use of modeling how to pilot an alien starship to be a little misleading, because a starship is a highly engineered product that functions in large part as a control mechanism for software. It comes with a clean design model already available for you to discover and copy.
Doctor Who fans will note that TARDIS craft seem to follow a different design: they regularly reconfigure themselves to fit their pilot, don't have controls laid out in any sensible fashion, and there's at least one reference to how they're "grown, not built". Then again they were also meant to be piloted by a crew and are most likely sentient, so it's also possible that due to the adaptations, the Doctor's TARDIS is just as eccentric as he is.
It's not like Doctor Who is "hard" sci-fi tho, it's basically Peter Pan in Space.
Strict architectural pattern usage requires understanding the domain, and understanding the patterns. If you have both, navigating the codebase will be intuitive. If you don't, you'll find 1000 LOC functions easier to parse.
That's the problem, if you are working in a compagny which have mostly junior (1 or two year of programming), it is better for you to not implement to complicate pattern otherwise your day will be fill of explaining what a Factory is.
>I’ve been in companies big and small using Python, both using and ignoring architectural patterns. Turns out all the big ones with strict architectural (n=3) pattern usage, although “clean”, the code is waaaay to complex and unnecessarily slow in tasks that at first glance should had been simple.
The problem with "strict architectural pattern usage" is that people think that a specific implementation, as listed in the reference, is "the pattern".
"The pattern" is the thought process behind what you're doing, and the plan for working with it, and the highest-level design of the API you want to offer to the rest of the code.
A state machine in Python, thanks to functions being objects, can often just be a group of functions that return each other, and an iteration of "f = f(x)". Sometimes people suggest using a Borg pattern in Python rather than a Singleton, but often what you really want is to just use the module. `sys` is making it a singleton for you already. "Dependency injection" is often just a fancy term for passing an argument (possibly another function) to a function. A Flyweight isn't a thing; it's just the technique of interning. The Command pattern described in TFA was half the point of Jack Diederich's famous rant (https://www.youtube.com/watch?v=o9pEzgHorH0); `functools.partial` is your friend.
> Maybe this says something about me more than the code but I hate to admit I was more productive in the non clean code companies.
I think you've come to draw a false dichotomy because you just haven't seen anything better. Short functions don't require complex class hierarchies to exist. They don't require classes to exist at all.
Object-oriented programming is about objects, not classes. If it were about classes, it would be called class-oriented programming.
I love this book but yes, you really need to understand when it makes sense to apply these patterns and when not to. I think of these kinds of architectural patterns like I think of project management. They both add an overhead, and both get a bad rap because if they are used indiscriminately, you will have many cases where the overhead completely dominates any value you get from applying them. However, when used judiciously they are critical to the success of the project.
For example, if I am standing up a straight-forward calendar rest api, I am not going to have a complicated architecture. However, these kinds of patterns, especially an adherence to a ports and adapters architecture, has been critical for me in building trading systems that are easy to switch between simulation and production modes seamlessly. In those cases I am really sure I will need to easily unplug simulators with real trading engines, or historical event feeds with real-time feeds, and its necessary that the business logic have not dual implementations to keep in sync.
My experience matches this. It's so liberating as well. I find it easier to internalise such code in my head compared to abstraction-soup. As you can imagine, I like golang.
Me three. I'm even happy to refactor code into a form where there's less repetition and perhaps more parametrised functions, etc.
Finding my way around a soup of ultra abstracted Matryoshka ravioli is my least favourite part of programming. Instead of simplifying things, now I need to consult 12 different objects spread over as many files before I can create a FactoryFactory.
This has been my experience in working with any kind of dogmatic structure or pattern in any language. It seems that the architecture astronauts have missed the point: making the code easier to understand for future developers without context, and provide some certainty that modifications behave as expected.
Here's an example of how things can go off the rails very quickly:
Rule 1: Functions should be short (no longer than 50 lines).
Rule 2: Public functions should be implemented with an interface (so they can be mocked).
Now as a developer who wants to follow the logic of the program, you have to constantly "go to definition" on function calls on interfaces, then "go to implementation" to find the behavior. This breaks your train of thought / flow state very quickly.
Now let's amp it up to another level of suck: replace the interface with a microservice API (gRPC). Now you have to tab between multiple completely different repos to follow the logic of the program. And when opening a new repo, which has its own architectural layers, you have to browse around just to find the implementation of the function you're looking for.
These aren't strawmen either... I've seen these patterns in place at multiple companies, and at this point I yearn for a 1000 line function with all of the behavior in 1 place.
> Turns out all the big ones with strict architectural (n=3) pattern usage, although “clean”, the code is waaaay to complex and unnecessarily slow in tasks that at first glance should had been simple.
My last job had a Python codebase just like this. Lots of patterns, implemented by people who wanted to do things "right," and it was a big slow mess. You can't get away with nearly as much in Python (pre-JIT, anyway) as you can in a natively compiled language or a JVM language. Every layer of indirection gets executed in the interpreter every single time.
What bothers me about this book and other books that are prescriptive about application architecture is that it pushes people towards baking in all the complexity right at the start, regardless of requirements, instead of adding complexity in response to real demands. You end up implementing both the complexity you need now and the complexity you don't need. You implement the complexity you'll need in two years if the product grows, and you place that complexity on the backs of the small team you have now, at the cost of functionality you need to make the product successful.
To me, that's architectural malpractice. Even worse, it affects how the programmers on your team think. They start thinking that it's always a good idea to make code more abstract. Your code gets bloated with ghosts of dreamed-of future functionality, layers that could hypothetically support future needs if those needs emerged. A culture of "more is better" can really take off with junior programmers who are eager to do good work, and they start implementing general frameworks on top of everything they do, making the codebase progressively more complex and harder to work in. And when a need they anticipated emerges in reality, the code they wrote to prepare for it usually turns out to be a liability.
Looking back on the large codebases I've worked with, they all have had areas where demands were simple and very little complexity was needed. The ones where the developers accepted their good luck and left those parts of the codebase simple were the ones that were relatively trouble-free and could evolve to meet new demands. The ones where the developers did things "right" and made every part of the codebase equally complex were overengineered messes that struggled under their own weight.
My preferred definition of architecture is the subset of design decisions that will be costly to change in the future. It follows that a goal of good design is minimizing architecture, avoiding choices that are costly to walk back. In software, the decision to ignore a problem you don't have is very rarely an expensive decision to undo. When a problem arises, it is almost always cheaper and easier to start from scratch than to adapt a solution that was created when the problem existed only in your head. The rare exceptions to this are extremely important, and from the point of view of optics, it always looks smarter and more responsible to have solved a problem incorrectly than not to have solved it at all, but we shouldn't make the mistake of identifying our worth and responsibility solely with those exceptions.
> What bothers me about this book and other books that are prescriptive about application architecture is that it pushes people towards baking in all the complexity right at the start, regardless of requirements, instead of adding complexity in response to real demands.
The trouble is if you strictly wait until it's time then basically everything requires some level of refactoring before you can implement it.
The dream is that new features is just new code, rather than refactoring and modifying existing code. Many people are already used to this idea. If you add a new "view" in a web app, you don't have to touch any other view, nor do you have to touch the URL routing logic. I just think more people are comfortable depending on frameworks for this kind of stuff rather than implementing it themselves.
The trouble is a framework can't know about your business. If you need pluggable validation layers or something you might have to implement it yourself.
The downside, of course, is we're not always great at seeing ahead of time where the application will need to be flexible and grow. So you could build this into everything, leading to unnecessarily complicated code, or nothing, leading to constant refactors which will get worse and worse as the codebase grows.
Your approach can work if developers actually spot what's happening early and actually do what's necessary when it actually is. Unfortunately in my experience people follow by example and the frog can boil for a long time before people start to realise that their time is spent mostly doing large refactors because the code just doesn't support the kind of flexibility and extensibility they need.
> The dream is that new features is just new code, rather than refactoring and modifying existing code
I don't just mean new features. I mean new cross-cutting capabilities. I mean emitting metrics from an application that has never emitted metrics. I also mean adding new dimensions to existing capabilities, like adding support for a second storage backend to an application that has only ever supported one database.
These are changes that I was always taught were important to anticipate. If you don't plan ahead, it'll be near impossible to add later, right? After a couple of decades of working on real-life codebases, seeing the work that people pour into anticipating future needs, making things pluggable, all that stuff, seeing exactly how helpful that kind of up-front speculative work turns out to be in practice when a real need arises, and comparing it to the work required to add something to a codebase that was never prepared for it, I have become a staunch advocate for skipping almost all of it.
> Unfortunately in my experience people follow by example and the frog can boil for a long time before people start to realise that their time is spent mostly doing large refactors because the code just doesn't support the kind of flexibility and extensibility they need
If the engineers are doing large refactors, what in the world could they be doing besides adding the "kind of flexibility and extensibility they need?"
One thing to keep in mind when you compare two options is that unless the options involve different hiring strategies, the people executing them will be the same. If you have developers doing repeated large refactors without being able to make the codebase serve the current needs staring them in the face, what do you think will happen if you ask them to prepare a codebase for uncertain future needs? It's a strictly harder problem, so they will do a worse job, or at least no better.
Patterns and Abstractions have a HUGE cost in python. They can be zero cost in C++ due to compiler, or very low cost due to JVM JIT, but in Python the cost is very significant, especially once you start adding I/O ops or network calls
Some parts of this book are extremely useful, especially when it's talking about concepts that are more general than Python or any other specific language -- such as event-driven architecture, commands, CQRS etc.
That being said, I have a number issues with other parts of it, and I have seen how dangerous it can be when inexperienced developers take it as a gospel and try to implement everything at once (which is a common problem with any collection of design patterns like this.
For example, repository is a helpful pattern in general; but in many cases, including the examples in the book itself, it is a huge overkill that adds complexity with very little benefit. Even more so as they're using SQLAlchemy, which is a "repository" in its own right (or, more precisely, a relational database abstraction layer with an ORM added on top).
Similarly, service layers and unit of work are useful when you have complex applications that cover multiple complex use cases; but in a system consisting of small services with narrow responsibilities they quickly become overly bloated using this pattern. And don't even get me started with dependency injection in Python.
The essential thing about design patterns is that they're tools like any other, and the developers should understand when to use them, and even more importantly when not to use them. This book has some advice in that direction, but in my opinion it should be more prominent and placed upfront rather at the end of each chapter.
Could you explain how repository pattern is a "huge overkill that adds complexity with very little benefit"? I find it a very light-weight pattern and would recommend to always use it when database access is needed, to clearly separate concerns.
In the end, it's just making sure that all database access for a specific entity all goes through one point (the repository for that entity). Inside the repository, you can do whatever you want (run queries yourself, use ORM, etc).
A lot of the stuff written in the article under the section Repository pattern has very little to do with the pattern, and much more to do with all sorts of Python, Django, and SQLAlchemy details.
In theory it's a nice abstraction, and the benefit is clear. In practice, your repository likely ends up forwarding its arguments one-for-one to SQLAlchemy's select() or session.query().
That's aside from their particular example of SQLAlchemy sessions, which is extra weird because a Session is already a repository, more or less.
I mean, sure, there's a difference between your repository for your things and types you might consider foreign, in theory, but how theoretical are we going to get? For what actual gain? How big of an app are we talking?
You could alias Repository = Session, or define a simple protocol with stubs for some of Session's methods, just for typing, and you'd get the same amount of theoretical decoupling with no extra layer. If you want to test without a database, don't bind your models to a session. If you want to use a session anyway but still not touch the database, replace your Session's scopefunc and your tested code will never know the difference.
It's not a convincing example.
Building your repository layer over theirs, admittedly you stop the Query type from leaking out. But then you implement essentially the Query interface in little bits for use in different layers, just probably worse, and lacking twenty years of testing.
Thanks, that makes a lot of sense. I don't have a whole bunch of experience with SQLAlchemy itself. In general, I prefer not to use ORMs but just write queries and map the results into value objects. That work I would put into a Repository.
Also in my opinion it's important to decouple the database structure from the domain model in the code. One might have a Person type which is constructed by getting data from 3 tables. A Repository class could do that nicely: maybe run a join query and a separate query, combine the results together, and return the Person object. ORMs usually tightly couple with the DB schema, which might create the risk of coupling the rest of the application as well (again, I don't know how flexible SQLAlchemy is in this).
There could be some value in hiding SQLAlchemy, in case one would ever like to replace it with a better alternative. I don't have enough experience with Python to understand if that ever will be the case though.
All in all, trade-offs are always important to consider. A tiny microservice consisting of a few functions: just do whatever. A growing modulith with various evolving domains which have not been fully settled yet: put some effort into decoupling and separating concerns.
I've used SqlAlchemy in a biggish project. Had many problems, the worst ones were around session scoping and DB hitting season limits, but we had issues around the models too.
The argument for hiding SqlAlchemy is nothing to do with "what if we change the DB"; that's done approximately never, and, even if so, you have some work to do, so do it at the time. YAGNI
The argument is that SA models are funky things with lazy loading. IIRC, that's the library where the metaclasses have metaclasses! It's possible to accidentally call the DB just by accessing a property.
It can be a debugging nightmare. You can have data races. I remember shouting at the code, "I've refreshed the session you stupid @#£*"
The responsible thing to do is flatten them to, say, a pydantic DTO. Then you can chuck them about willy-nilly. Your type checker will highlight a DTO problem that an SA model would have slipped underneath your nose.
The difficulty you have following that is that, when you have nested models, you need to know in advance what fields you want so you don't overfetch. I guess you're thinking "duh, I handcraft my queries" and my goodness I see the value of that approach now. However, SA still offers benefits even if you're doing this more tightly-circumscribed fetch-then-translate approach.
This is partly how I got from the eager junior code golf attitude to my current view, which is, DO repeat yourself, copy-paste a million fields if you need, don't sweat brevity, just make a bunch of very boring data classes.
Just a heads-up if you haven't seen it: Overriding lazy-loading options at query time can help with overfetching.
class Author(Model):
books = relationship(..., lazy='select')
fetch_authors = select(Author).options(raiseload(Author.books))
Anything that gets its Authors with fetch_authors will get instances that raise instead of doing a SELECT for the books. You can throw that in a smoke test and see if there's anything sneaking a query. Or if you know you never want to lazy-load, relationship(..., lazy='raise') will stop it at the source.
SQLModel is supposed to be the best of both Pydantic and SQLAlchemy, but by design
an SQLModel entity backed by a database table doesn't validate its fields on creation, which is the point of Pydantic.
I can't take a position without looking under the hood, but what concerns me is "SqlModel is both a pydantic model and an SA model", which makes me think it may still have the dynamic unintended-query characteristics that I'm warning about.
I seem to recall using SqlModel in a pet project and having difficulty expressing many-to-many relationships, but that's buried in some branch somewhere. I recall liking the syntax more than plain SA. I suspect the benefits of SqlModel are syntactical rather than systemic?
"Spaghetti" is an unrelated problem. My problem codebase was spaghetti, and that likely increased the problem surface, but sensible code doesn't eliminate the danger
I mean that from the point of view of YAGNI for a small app. For a big one, absolutely, you will find the places where the theoretical distinctions suddenly turn real. Decoupling your data model from your storage is a real concern and Session on its own won't give you that advantage of a real repository layer.
SQLAlchemy is flexible, though. You can map a Person from three tables if you need to. It's a data mapper, then a separate query builder on top, then a separate ORM on top of that, and then Declarative which ties them all together with an ActiveRecord-ish approach.
> I prefer not to use ORMs but just write queries and map the results into value objects. That work I would put into a Repository.
Yep, I hear ya. Maybe if they'd built on top of something lower-level like stdlib sqlite3, it wouldn't be so tempting to dismiss as YAGNI. I think my comment sounded more dismissive than I really meant.
SQLAlchemy Session is actually a unit of work (UoW), which they also build on top. By the end of the book they are using their UoW to collect and dispatch events emitted by the services. How would they have done that if they just used SQLAlchemy directly?
You might argue that they should have waited until they wanted their own UoW behaviour before actually implementing it, but that means by the time they need it they need to go and modify potentially hundreds of bits of calling code to swap out SQLAlchemy for their own wrapper. Why not just build it first? The worst that happens is it sits there being mostly redundant. There have been far worse things.
The tricks you mention for the tests might work for SQLAlchemy, but what if we're not using SQLAlchemy? The repository pattern works for everything. That's what makes it a pattern.
I understand not everyone agrees on what "repository" means. The session is a UoW (at two or three levels) and also a repository (in the sense of object-scoped persistence) and also like four other things.
I'm sort of tolerant of bits of Session leaking into things. I'd argue that its leaking pieces are the application-level things you'd implement, not versions of them from the lower layers that you need to wrap.
When users filter data and their filters go from POST submissions to some high-level Filter thing I'd pass to a repository query, what does that construct look like? Pretty much Query.filter(). When I pick how many things I want from the repository, it's Query.first() or Query.one(), or Query.filter().filter().filter().all().
Yes, it's tied to SQL, but only in a literal sense. The API would look like that no matter what, even if it wasn't. When the benefit outweighs the cost, I choose to treat it like it is the thing I should have written.
It isn't ideal or ideally correct, but it's fine, and it's simple.
You seem to have stopped reading my comment after the first sentence. I asked some specific questions about how you would do what they did if you just use SQLAlchemy as your repository/UoW.
In my experience, both SQL and real-world database schema are each complex enough beasts that to ensure everything is fetched reasonably optimally, you either need tons of entity-specific (i.e. not easily interface-able) methods for every little use case, or you need to expose some sort of builder, at which point why not just use the query builder you're almost certainly already calling underneath?
Repository patterns are fine for CRUD but don't really stretch to those endpoints where you really need the query with the two CTEs and the four joins onto a query selecting from another query based on the output of a window function.
Repository pattern is useful if you really feel like you're going to need to switch out your database layer for something else at some point in the future, but I've literally never seen this happen in my career ever. Otherwise, it's just duplicate code you have to write.
I’ve seen it, but of course there was no strict enforcement of the pattern so it was a nightmare of leakage and the change got stuck half implemented, with two databases in use.
What is the alternative that you use, how do you provide data access in a clean, separated, maintainable way?
I have seen it a lot in my career, and have used it a lot. I've never used it in any situation to switch out a database layer for something else. It seems like we have very different careers.
I also don't really see how it duplicates code. At the basic level, it's practically nothing more than putting database access code in one place rather than all over the place.
What we are talking about is a "transformation" or "mapper" layer isolating your domain entities from the persistence. If this is what we call "Repository" then yes, I absolutely agree with you -- this is the right approach to this problem. But if the "Repository pattern" means a complex structure of abstract and concrete classes and inheritance trees -- as I have usually seen it implemented -- then it is usually an overkill and rarely a good idea.
Thanks. In my mind, anything about complex structures of (abstract) classes and/or inheritance trees has nothing to do with a Repository pattern.
As I understand it, Repository pattern is basically a generalization of the Data Access Object (DAO) pattern, and sometimes treated synonymously.
The way I mean it and implement it, is basically for each entity have a separate class to provide the database access. E.g. you have a Person (not complex at all, simply a value object) and a PersonRepository to get, update, and delete Person objects.
Then based on the complexity and scope of the project, Person either 1-to-1 maps to a e.g. a database table or stored object/document, or it is a somewhat more complex object in the business domain and the repository could be doing a little bit more work to fetch and construct it (e.g. perhaps some joins or more than 1 query for some data).
> for each entity have a separate class to provide the database access
Let me correct you: for each entity that needs database access. This is why I'm talking about layers here: sometimes entities are never persisted directly, but only as "parts" or "relations" of other entities; in other cases you might have a very complex persistence implementation (e.g. some entities are stored in a RDB, while others in a filesystem) and there is no clear mapping.
I recommend you to approach this from the perspective of each domain entity individually; "persistability" is essentially just another property which might or might not apply in each case.
Naturally, Repository is a pattern for data(base) access, so it should have nothing to do with objects that are not persisted. I used "entity" as meaning a persisted object. That was not very clear, sorry.
Well, again, that is not completely straightforward - what exactly is a "persisted object"? We have two things here that are usually called entities:
1. The domain entities, which are normally represented as native objects in our codebase. They have no idea whether they need to be persisted and how.
2. The database entities, which are - in RDBs at least - represented by tables.
It is not uncommon that our entities of the first type can easily be mapped 1:1 to our entities of the second type - but that is far from guaranteed. Even if this is the case, the entities will be different because of the differences between the two "worlds": for example, Python's integer type doesn't have a direct equivalent in, say, PostgreSQL (it has to be converted into smallint, integer, bigint or numeric).
In my "correction" above I was talking about the domain entities, and my phrasing that they "need database access" is not fully correct; it should have been "need to be persisted", to be pedantic.
I rarely mock a repository. Mocking the database is nice for unit-testing, it's also a lot faster than using a real DB, but the DB and DB-application interface are some of the hottest spots for bugs: using a real DB (same engine as prod) gives me a whole lot more confidence that my code actually works. It's probably the thing I'm least likely to mock out, despite making tests more difficult to write and quite a bit slowerq
I had a former boss who strongly pushed my team to use the repository pattern for a microservice. The team wanted to try it out since it was new to us and, like the other commenters are saying, it worked but we never actually needed it. So it just sat there as another layer of abstraction, more code, more tests, and nothing benefited from it.
Anecdotally, the project was stopped after nine months because it took too long. The decision to use the repository pattern wasn't the straw that broke the camel's back, but I think using patterns that were more complicated than the usecase required was at the heart of it.
Could you give me some insights what the possible alternative was that you would have rather seen?
I am either now learning that the Repository pattern is something different than what I understand it to be, or there is misunderstanding here.
I cannot understand how (basically) tucking away database access code in a repository can lead to complicated code, long development times, and the entire project failing.
Your understanding of the repository pattern is correct. It's the other people in this thread that seem to have misunderstood it and/or implemented it incorrectly. I use the repository pattern in virtually every service (when appropriate) and it's incredibly simple, easy to test and document, and easy to teach to coworkers. Because most of our services use the repository pattern, we can jump into any project we're not familiar with and immediately have the lay of the land, knowing where to go to find business logic or make modifications.
One thing to note -- you stated in another comment that the repository pattern is just for database access, but this isn't really true. You can use the repository pattern for any type of service that requires fetching data from some other location or multiple locations -- whether that's a database, another HTTP API, a plain old file system, a gRPC server, an ftp server, a message queue, an email service... whatever.
This has been hugely helpful for me as one of the things my company does is aggregate data from a lot of other APIs (whois records, stuff of that nature). Multiple times we've had to switch providers due to contract issues or because we found something better/cheaper. Being able to swap out implementations was incredibly helpful because the business logic layer and its unit tests didn't need to be touched at all.
Before I started my current role, we had been using kafka for message queues. There was a huge initiative to switch over to rabbit and it was extremely painful ripping out all the kafka stuff and replacing it with rabbit stuff and it took forever and we still have issues with how the switch was executed to this day, years later. If we'd been using the repository pattern, the switch would've been a piece of cake.
Thanks. I was starting to get pretty insecure about it. I don't actually know why in my brain it was tightly linked to only database access. It makes perfect sense to apply it to other types of data retrieval too. Thanks for the insights!
> And don't even get me started with dependency injection in Python.
Could I get you started? Or could you point me to a place to get myself started? I primarily code in Python and I've found dependency injection, by which I mean giving a function all the inputs it needs to calculate via parameters, is a principle worth designing projects around.
> I have seen how dangerous it can be when inexperienced developers take it as a gospel and try to implement everything at once
This book explicitly tells you not to do this.
> Similarly, service layers and unit of work are useful when you have complex applications that cover multiple complex use cases; but in a system consisting of small services with narrow responsibilities they quickly become overly bloated using this pattern. And don't even get me started with dependency injection in Python.
I have found service layers and DI really helpful for writing functional programs. I have some complex image-processing scripts in Python that I can use as plug-ins with a distributed image processing service in Celery. Service layer and DI just takes code from:
```python
dependency.do_thing(params)
```
To:
```python
do_thing(dependency, params)
```
Which ends up being a lot more testable. I can run image processing tasks in a live deployment with all of their I/O mocked, or I can run real image processing tasks on a mocked version of Celery. This lets me test all my different functions end-to-end before I ever do a full deploy. Also using the Result type with service layer has helped me propagate relevant error information back to the web client without crashing the program, since the failure modes are all handled in their specific service layer function.
I just rolled my own mocked Celery objects. I have mocked Groups, Chords, Chains, and Signatures, mocked Celery backend, and mocked dispatch of tasks. Everything runs eagerly because it's all just running locally in the same thread, but the workflow still runs properly-- the output of one task is fed into the next task, the tasks are updated, etc.
I actually pass Celery and the functions like `signature`, `chain`, etc as a tuple into my service layer functions.
It's mostly just to test that the piping of the workflow is set up correctly so I don't find out that my args are swapped later during integration tests.
This was my takeaway too. It’s interesting to see the patterns. It would be helpful for some guidance upfront around when the situations in which they are most useful to implement. If a pattern is a tool, then steering me towards when it’s used or best avoided would be helpful. I do appreciate that the pros and cons sections get to this point, so perhaps it’s just ordering and emphasis.
That said, having built a small web app to enable a new business, and learning python along the way to get there, this provided me with some ideas for patterns I could implement to simplify things (but others I think I’ll avoid).
> That being said, I have a number issues with other parts of it, and I have seen how dangerous it can be when inexperienced developers take it as a gospel and try to implement everything at once (which is a common problem with any collection of design patterns like this.
Robert Martin is one of those examples, he did billions in damages by brainwashing inexperienced developers with his gaslighting garbage like "Clean Code".
Software engineering is not a hard science so there is almost never a silver bullet, everything is trade-offs, so people that claim to know the one true way are subcriminal psychopaths or noobs
Clean code has lots of useful tips and techniques.
When people are criticizing it they pick a concept from one or two pages out the hundreds and use it to dismiss the whole book. This is a worse mistake than introducing concepts that may be foot guns in some situations.
Becoming an experienced engineer is learning how, when and where to apply tools from your toolkit.
I’m a Typescript dev but this book is one of my favorite architecture books, I reference it all the time. My favorite pattern is the fake unit of work/service patterns for testing, I use this religiously in all my projects for faking (not mocking!!) third party services. It also helped me with dilemmas around naming, eg it recommends naming events in a very domain specific way rather than infrastructure or pattern specific way (eg CART_ITEM_BECAME_UNAVAILABLE is better than USER_NOTIFICATION). Some of these things are obvious but tedious to explain to teammates, so the fact that cosmic python is fully online makes it really easy to link to. Overall, a fantastic and formative resource for me!
That book is in a similar place in my heart, I barely used Python in my professional life, yet it's a book I often come back to even if I'm using a different language. It's also great that book is available both online and in paper form.
I grew tired from the forced OOP mindset, where you have to enforce encapsulation and inheritance on everything, where you only have private fields which are set through methods.
I grew tired of SOLID, clean coding, clean architecture, GoF patterns and Uncle Bob.
I grew tired of the Kingdom of Nouns and of FizzBuzz Enterprise Editions.
I now follow imperative or functional flows with least OOP as possible.
In the rare cases I use Python (not because I don't want to, but because I mainly use .NET at work) I want the experience to be free of objects and patterns.
I am not trying to say that this book doesn't have a value. It does. It's useful to learn some patterns. But don't try to fit everything in real life programming. Don't make everything about patterns, objects and SOLID.
my favourite model is to write as many pure functions as possible, and then as many functions of 1-4 parameters that interact with the outside world, and only then create domain objects to wrap those - it keeps the unrelated complexity out of the domain and then I can also reuse those functions without having to create the entire object that I don't always need.
I am not convinced that domain driven design works. Objects doesn't model the real world well. Why we should think DDD model the real world or a business well? And why do we even need to model something?
Computers are different than humans.
I think we should be pragmatic and come with the best solution in terms of money/time/complexity. Not trying to mimick human thought using computers.
After all a truck isn't mimicking horse and carriage. A plane isn't mimicking a bird.
> I am not convinced that domain driven design works. Objects doesn't model the real world well.
DDD does not necessitate OOP. You can do DDD in functional languages. I think there's a whole F# book about it. So I think you can conclude that OOP doesn't model the world well, which may or may not be true, but I think it's not valid to extend the conclusion to DDD. Is there a DDD-inherent reason why you think DDD does not work?
> And why do we even need to model something?
Well I suppose it depends on your definition of "model", but aren't you by necessity modeling something when you write software? Somewhere in your software there is a user thingy/record/object/type/dictionary/struct which is an imperfect and incomplete representation of a real life person. I.e., a model.
DDD isn't about objects. It's just about modelling the domain (real world) using the tools available to you. Some things are best modelled by objects, some are best modelled by functions or other constructs.
The real point is establish a common language to talk about the domain. This is enormously powerful. Have you ever worked with people who don't speak your language? Everything takes 3x as long as ideas aren't communicated properly and things get lost in translation.
It's the same with code. If your code is all written in the language of the computer then you'll be translating business language into computer language. This takes longer and it's more error prone. The business people can't check your work because they don't know how to code and the coders can't check because they don't know the business.
The point of building abstractions is it empowers you to write code in a language that is much closer to the language of the domain experts. DDD is one take but the basic idea goes all the way back to things like SICP.
Good coders learn enough about the business to check the code. Domain experts can look at the running software.
With DDD you just get a third model that is neither the domain, nor the software, and both the domain experts and the programmers will have to work extra to maintain and understand it.
Worse, people often try to build this model up front, which means it will be wrong, hard to implement and probably get thrown away if you actually want to ship anything
I think people are getting triggered by the word domain, and conflating it with a particular cargo cult called DDD. It's the same with agile - the one that's implemented is usually the cargo cult version that charges you the cost of the "official" process without the benefits.
I meant domain modelling in the simplest sense: I created three objects, the market, the trading strategy and a simulated version of the market. that's it. no paperwork, no forms filled in triplicate.
IME mention of DDD has been a pretty solid predictor of "this project will produce nothing but documents for months and if it gets to the implementation stage complexity will spiral out of control. It probably won't ship."
It's possible that it works for some people, but I've seen the above scenario play out too many times to see DDD as anything but a red flag
At its core, objects are just containers for properties, and exploiting that fact leads to easily understood systems than the one without.
For example, at work I'm currently refactoring a test for parsing the CSV output of a system; as it stands it depends on hardcoded array indexes, which makes the thing a mess. Defining a few dataclasses here and there to model each entry of the CSV file, and then writing the test with said objects has made the test much more pleasant and easily understood.
This is how I treat it as well by now. I'm not even certain if it's object oriented, or not or hybrid, or not, or whatever.
But in a small project at work that's mostly about orchestrating infrastructure components around a big central configuration it's just that. You define some dataclasses, or in our case Pydantic models to parse APIs or JSON configurations because then you can use type-aheads based off of types and it's more readable.
And then, if you notice you end up having a lot of "compute_some_thing(some_object)", you just make it an @property on that object, so you can use "some_object.some_thing".
If you often move attributes of some foo-object to create some bar-object, why not introduce an @classmethod Foo.from_bar or @property Bar.as_foo? This also makes the relationship between Foo and Bar more explicit.
OOP doesn't have to be the horrible case some legacy Java Enterprise Projects make it to be. It can also be a bunch of procedural code with a few quaint objects in between, encapsulating closely related ideas of the domain.
the model is for me. I wrote a crude btc trading bot, and I had three major objects: Market(), TradingStrategy and SimulatedMarket which used historical data.
I did not model individual transactions, fees, etc as objects. The idea was that I encapsulated all the stuff that's relevant to a strategy in one object, and the market object would give me things like get_price(), buy(), sell(), which would also be available in the simulated version.
And if you can encapsulate the 3 different domains well, if you switch brokers you should have any external dependencies. Make your functions/domains deep.
- Reimplement SQLAlchemy models (we'll call it a "repository")
- Reimplement SQLAlchemy sessions ("unit of work")
- Add a "service layer" that doesn't even use the models -- we unroll all the model attributes into separate function parameters because that's less coupled somehow
- Scatter everything across a message bus to remove any hope of debugging it
- AND THIS IS JUST FOR WRITES!
- For reads, we have a separate fucking denormalized table that we query using raw SQL. (Seriously, see Chapter 12)
Hey, let's see how much traffic MADE.com serves. 500k total visits from desktop + mobile last month works out to... 12 views per MINUTE.
Gee, I wish my job was cushy enough that I could spend all day writing about "DDD" with my thumb up my ass.
I've made it through about 75% of the book and have never gotten the sense that they think everything discussed in the book is something you should always do. Each pattern discussed has a summary of pros and cons. While they may be a bit lacking, they clearly articulate the fact that you should be thinking whether or not the pattern matches the application's needs.
I don't think there's many applications that will require everything in the book but there are certaintly many applications that could apply one or more patterns discussed.
OK so show us how to write software for a complex business properly. Oh, I see, it's a throwaway account. This is just drive-by negativity with zero value.
I started writing python professionally a few years ago. Coming from Kotlin and TypeScript, I found the language approachable but I was struggling to build things in an idiomatic fashion that achieved the loose coupling and testability that I was used to. I bought this book after a colleague recommended it and read it cover to cover. It really helped me get my head around ways to manage complexity in non trivial Python codebases. I don’t follow every pattern it recommends, but it opened my eyes to what’s possible and how to apply my experience in other paradigms to Python without it becoming “Java guy does Python”.
Truly one of the great python programming books. The one thing that I found missing was the lack of static typing in the code, but that was a deliberate decision by the authors.
Haven’t read the book, so I don’t know exactly what position they’re taking there, but type checking has done more to improve my Python than any amount of architectural advice. How hard it is to type hint your code is a very good gauge of how hard it will be to understand it later.
My experience is that once people have static typing to lean on they focus much less on the things that in my view are more crucial to building clean, readable code: good, consistent naming and small chunks.
Just the visual clutter of adding type annotations can make the code flow less immediately clear and then due to broken windows syndrome people naturally care less and less about visual clarity.
So far off from what actually happens. The type annotations provide an easy scaffolding for understand what the code does in detail when reading making code flow and logic less ambiguous. Reading Python functions in isolation, you might not even know what data/structure you’re getting as input… if there’s something that muddles up immediate clarity it’s ambiguity about what data code is operating on.
I disagree strongly, based on 20 years of using Python without annotations and ~5 years of seeing people ask questions about how to do advanced things with types. And based on reading Python code, and comparing that to how I feel when reading code in any manifest-typed language.
>Reading Python functions in isolation, you might not even know what data/structure you’re getting as input
I'm concerned with what capabilities the input offers, not the name given to one particular implementation of that set of capabilities. If I have to think about it in any more detail than "`ducks` is an iterable of Ducklike" (n.b.: a code definition for an ABC need not actually exist; it would be dead code that just complicates method resolution) I'm trying to do too much in that function. If I have to care about whether the iterable is a list or a string (given that length-1 strings satisfy the ABC), I'm either trying to do the wrong thing or using the wrong language.
> if there’s something that muddles up immediate clarity it’s ambiguity about what data code is operating on.
There is no ambiguity. There is just disregard for things that don't actually matter, and designing to make sure that they indeed don't matter.
>I'm concerned with what capabilities the input offers, not the name given to one particular implementation of that set of capabilities. If I have to think about it in any more detail than "`ducks` is an iterable of Ducklike" (n.b.: a code definition for an ABC need not actually exist; it would be dead code that just complicates method resolution) I'm trying to do too much in that function. If I have to care about whether the iterable is a list or a string (given that length-1 strings satisfy the ABC), I'm either trying to do the wrong thing or using the wrong language.
You can specify exactly that and no more, using the type system:
def foo(ducks: Iterable[Ducklike]) -> None:
...
If you are typing it as list[Duck] you're doing it wrong.
I understand that. The point is that I gain no information from it, and would need more complex typing to gain information.
I keep seeing people trying to wrap their heads around various tricky covariance-vs-contravariance things (I personally can never remember which is which), or trying to make the types check for things that just seem blatantly unreasonable to me. And it takes up a lot of discussion space in my circles, because two or more people will try to figure it out together.
No, you do gain information from it: that the function takes an Iterable[Ducklike].
Moreover, now you can tell this just from the signature, rather than needing to discover it yourself by reading the function body (and maybe the bodies of the functions it calls, and so on ...). Being able to reason about a function without reading its implementation is a straightforward win.
>I already had that information. I understand my own coding style.
Good for you, but you're not the only person working on the codebase, surely.
>My function bodies are generally only a few lines, but my reasoning here is based on the choice of identifier name.
Your short functions still call other functions which call other functions which call other functions. The type will not always be obvious from looking at the current function body; often all a function does with an argument is forward it along untouched to another function. You often still need to jump through many layers of the call graph to figure out how something actually gets used.
An identifier name can't be as expressive as a type without sacrificing concision, and can't be checked mechanically. Why not be precise, why not offload some mental work onto the computer?
>Yes, it takes discipline, but it's the same kind of discipline as adding type annotations.
No, see, this is an absolutely crucial point of disagreement:
Adding type annotations is not "discipline"!
Or at least, not the same kind of discipline as remembering the types myself and running the type checker in my head. The type checker is good because it relieves me of the necessity of discipline, at least wrt to types.
Discipline consumes scarce mental effort. It doesn't scale as project complexity grows, as organizations grow, and as time passes. I would rather spend my limited mental effort on higher level things; making sure types match is rote clerical work, entirely suitable to a machine.
The language of "discipline" paints any mistake as a personal/moral failure of an individual. It's the language of a blame-culture.
> Good for you, but you're not the only person working on the codebase, surely.
I actually am. But I've also read plenty of non-annotated Python code from strangers without issue. Including the standard library, random GitHub projects I gave a PR to fix some unidiomatic expression (defense in depth by avoiding `eval` for example), etc. When the code of others is type-annotated, I often find it just as distracting as all the "# noqa: whatever" noise not designed to be read by humans.
And long functions are vastly more mentally taxing.
> often all a function does with an argument is forward it along untouched to another function. You often still need to jump through many layers of the call graph to figure out how something actually gets used.
Yes, and I find from many years of personal experience that this doesn't cause a problem. I don't need to "figure out how something actually gets used" in order to understand the code. That's the point of organizing it this way. This is also one of the core lessons of SICP as I understood it. The dynamic typing of LISP is not an accident.
> An identifier name can't be as expressive as a type without sacrificing concision
On the contrary: it is not restricted to referring to abstractions that were explicitly defined elsewhere.
> Why not be precise, why not offload some mental work onto the computer?
When I have tried to do it, I have found that the mental work increased.
> No, see, this is an absolutely crucial point of disagreement
In my experience, arguing co versus contra is a sign you are working with dubious design decisions in the first place. Mostly this is where I punch a hole in the type system and use Any. That way I can find all the bad architectural decisions in the codebase using grep.
IMO this is the source of much of the demand for type hints in Python. People don't want to write idiomatic Python, they want to write Java - but they're stuck using Python because of library availability or an existing Python codebase.
So, they write Java-style code in Python. Most of the time this means heavy use of type hints and an overuse of class hierarchies (e.g. introducing abstract classes just to satisfy the type checker) - which in my experience leads to code that's twice as long as it should be. But recently I heard more extreme advice - someone recommended "write every function as a member of a class" and "put every class in its own file".
I’d say I use type hints to write Python that looks more like Ocaml. Class hierarchies shallow to nonexistent. Abundant use of sum types. Whenever possible using Sequence, Mapping, and Set rather than list, dict, or set. (As these interfaces don’t include mutation, even if the collection itself is mutable.) Honestly if you’re heavily invested in object oriented modeling in Python, you’re doing it wrong. What a headache.
This is totally not how I used typed Python. I eschew classes almost entirely, save for immutable dataclasses. I don't use inheritance at all. Most of the code is freestanding pure functions.
I can remember in the mid-00s especially, Python gurus were really fond of saying "Python is not Java". But `unittest` was "inspired by" JUnit and `logging` looks an awful lot like my mental image of Log4J of the time.
> But recently I heard more extreme advice - someone recommended "write every function as a member of a class" and "put every class in its own file".
Not coincidentally, these are two of my least favorite parts of the standard library. Logging especially makes me grumpy, with its hidden global state and weird action at a distance. It’s far too easy to use logging wrong. And unittest just feels like every other unit testing framework from that era, which is to say, vastly overcomplicated for what it does.
Exactly my experience. I call Python a surprise-typed language. You might write a function assuming its input is a list, but then somebody passes it a string, you can iterate over it so the function returns something, but not what you would have expected, and things get deeply weird somewhere else in your codebase as a result. Surprise!
Type checking on the other hand makes duck typing awesome. All the flexibility, none of the surprises.
This is because of Python's special handling of iteration and subscripting for strings (so as to avoid having a separate character type), not because of the duck typing. In ordinary circumstances (e.g. unless you need to be careful about a base case for recursion - but that would cause a local fault and not "deep weirdness at a distance"), the result is completely intuitive (e.g. you ask it to add each element of a sequence to some other container, and it does exactly that), and I've written code that used these properties very intentionally.
If you passed a string expecting it to be treated as an atomic value rather than as a sequence (i.e. you made a mistake and want a type checker to catch it for you), there are many other things you can do to avoid creating that expectation in the first place.
Annotations can and should be checked. If I change a parameter type, other code using the function will now show errors. That won't happen with just documentation.
Unfortunately Python’s type system is unsound. It’s possible to pass all the checks and yet still have a function annotated `int` that returns a `list`.
True and irrelevant. Type annotations catch whole swathes of errors before they cause trouble and they nudge me into writing clearer code. I know they’re not watertight. Sometimes the type checker just can’t deal with a type and I have to use Any or a cast. Still better than not using them.
Do you mean that you're allowed to only use types where you want to, which means maybe the type checker can't check in cases where you haven't hinted enough, or is there some problem with the type system itself?
The type system itself is unsound. For example, this code passes `mypy --strict`, but prints `<class 'list'>` even though `bar` is annotated to return an `int`:
i : int | list[int] = 0
def foo() -> None:
global i
i = []
def bar() -> int:
if isinstance(i, int):
foo()
return i
return 0
print(type(bar()))
- Don't write unsound code? There's no way to know until you run the program and find out your `int` is actually a `list`.
- Don't assume type annotations are correct? Then what's the point of all the extra code to appease the type checker if it doesn't provide any guarantees?
You may as well argue that unit tests are pointless because you could cheat by making the implementations return just the hardcoded values from the test cases.
It doesn't. There are cases where the type-checker can't know the type (e.g. json.load has to return Any), but there are tools in the language to reduce how much that happens. If you commit to a fully strictly-typed codebase, it doesn't happen often.
Yeah, people from statically typed languages sometimes can't understand how dynamically typed languages can even work. How can I do anything if I don't know what type to pass?! Because we write functions like "factorial(number)" instead of "int fac(int n)".
Yup! I'm also hopeful that the upcoming type-checker from Astral will be an improvement over Mypy. I've found that Mypy's error messages are sometimes hard to reason about.
Some examples use dataclasses, which force type annotations.
Python does not support static typing. Tooling based on type annotations doesn't affect the compilation process (unless you use metaprogramming, like dataclasses do) and cannot force Python to reject the code; it only offers diagnostics.
I have this on my shelf. It's a small volume, similar to K&R, and like that book mine is showing visible signs of wear as I've thumbed through it a lot.
Oh neat, I read the paper back of this book maybe two and a half or three years or so ago. I enjoyed it quite a bit. They do a good job at keeping tests a first class topic and consistently updating them with each addition. Some older architecture books don't treat testing as being as high up in their priorities. I've just found that having tests ready, easy to write, and easy to update, makes the development process more enjoyable for me since it's less manual for running the code to check for issue - tighter feedback look I guess.
I will say that some of the event oriented parts of this book were very interesting, but didn't seem as practical to implement in my current work.
This was a great read and summary! About three years ago, I worked in a C#/.NET DDD environment, and now revisiting these concepts in Python really distills the essential parts. As I said, great read — highly recommend it if you're also into this kind of stuff.
I don’t understand the need for most of the patterns described in this book. Why abstract away SQLalchemy using a Repository when it is already an abstraction over the database? What’s the purpose of the unit of work? To me hand rolling SQL is much more maintainable than this abstraction soup
Even though most people might think of web architectures when it comes to this book, we used this to design an architecture for an AI that optimises energy efficiency in a manufacturing factory.
Is it easy to transpose to other types of architectures, or is it leaning heavily against web development? Thank you for sharing, your project sounds really interesting by the way! :)
One of the key points of the book (and DDD in general) is the web stuff is just a detail at the edge of an application. You should be able to replace the web bit (for which they use flask) with any other entry point. In fact, they do this by having an event subscriber entry point and IIRC a CLI entry point. The whole point is it all uses the same core code implementing the domain logic.
Unfortunately https://www.cosmicpython.com/book/ does give a 404 - this is a very bad architectural choice for web applications. I hope their other tips are better.
Wow this book is a goldmine for architecture patterns. I love how easy it is to get into a topic and quickly grasp it.
Having said that, from a practical and experience standpoint, using some of these patterns can really spiral out into an increased complexity and performance issues in Python, specially when you use already opinionated frameworks like Django which already uses the ActiveRecord pattern.
I’ve been in companies big and small using Python, both using and ignoring architectural patterns. Turns out all the big ones with strict architectural (n=3) pattern usage, although “clean”, the code is waaaay to complex and unnecessarily slow in tasks that at first glance should had been simple.
Whereas the big companies that didn’t care for these although the code was REALLY ugly in some places (huge if-else files/functions, huge Django models with all business logic implemented in them), I was most productive because although the code was ugly I could read it, understand it, and modify the 1000 lines of if-else statements.
Maybe this says something about me more than the code but I hate to admit I was more productive in the non clean code companies. And don’t get me started on the huge amount of discussions they avoided on what’s clean or not.
I think one of the biggest problems I encounter whenever I hear that a project follows strict architectural patterns essentially boils down to too many obfuscated abstractions that hide what is going on, or force you to jump through too many layers to accomplish tasks.
Many files/functions/classes need to be updated to accomplish even simple tasks because somebody made a decision that you aren't allowed to do X or Y thing without creating N other things.
But in those companies that didn't care about architectural patterns its very likely that while there was more ugly code in certain places, it resulted in code with less indirection and more contained to a single area/unit or the task at hand making it easier for people to jump in and understand. I see so many people who create function after function in file after file to abstract away functionality when I'd honestly rather have a 100 line function or method that I can easily jump around and edit/debug vs many tiny functions all in separate areas.
Not to say having some abstractions are bad but the more I work in this field the more I realize the less abstractions there are, the easier it is to reason about singular units/features in code. I've basically landed on just abstract away the really hard stuff, but stop abstracting out things that simple.
I've come to the similar conclusion - just write the damn logic inline, and only decouple the parts which would make the whole thing difficult to test. Test decoupled parts thoroughly but in isolation.
I think you just discovered software engineering, which at its best makes intelligent tradeoffs to optimise use of resources to meet needs.
I found the book's use of modeling how to pilot an alien starship to be a little misleading, because a starship is a highly engineered product that functions in large part as a control mechanism for software. It comes with a clean design model already available for you to discover and copy.
Domain modeling should not be about copying the existing model -- it should be about improving on it using all the advantages software has over the physical and social technologies the new software product is meant to replace. People are smart, and in most projects, there are key aspects of the existing domain model that are excellent abstractions that can and should be part of the new model. It's important to understand what stakeholders are trying to achieve with their current system before attempting to replace it.
But the models used in the business and cultural world are often messy, outdated and unoptimized for code. They rely on a human to interpret the edge cases and underspecified parts. We should treat that as inspiration, not the end goal.
> I found the book's use of modeling how to pilot an alien starship to be a little misleading, because a starship is a highly engineered product that functions in large part as a control mechanism for software. It comes with a clean design model already available for you to discover and copy.
Doctor Who fans will note that TARDIS craft seem to follow a different design: they regularly reconfigure themselves to fit their pilot, don't have controls laid out in any sensible fashion, and there's at least one reference to how they're "grown, not built". Then again they were also meant to be piloted by a crew and are most likely sentient, so it's also possible that due to the adaptations, the Doctor's TARDIS is just as eccentric as he is.
It's not like Doctor Who is "hard" sci-fi tho, it's basically Peter Pan in Space.
Strict architectural pattern usage requires understanding the domain, and understanding the patterns. If you have both, navigating the codebase will be intuitive. If you don't, you'll find 1000 LOC functions easier to parse.
That's the problem, if you are working in a compagny which have mostly junior (1 or two year of programming), it is better for you to not implement to complicate pattern otherwise your day will be fill of explaining what a Factory is.
Yes, if you have a PhD in the organization's domain model, anything is possible. Any mess of code, whether "clean" or not, can be reasoned through.
The problem is this takes years of on-site experience to attain this level of domain understanding.
>I’ve been in companies big and small using Python, both using and ignoring architectural patterns. Turns out all the big ones with strict architectural (n=3) pattern usage, although “clean”, the code is waaaay to complex and unnecessarily slow in tasks that at first glance should had been simple.
The problem with "strict architectural pattern usage" is that people think that a specific implementation, as listed in the reference, is "the pattern".
"The pattern" is the thought process behind what you're doing, and the plan for working with it, and the highest-level design of the API you want to offer to the rest of the code.
A state machine in Python, thanks to functions being objects, can often just be a group of functions that return each other, and an iteration of "f = f(x)". Sometimes people suggest using a Borg pattern in Python rather than a Singleton, but often what you really want is to just use the module. `sys` is making it a singleton for you already. "Dependency injection" is often just a fancy term for passing an argument (possibly another function) to a function. A Flyweight isn't a thing; it's just the technique of interning. The Command pattern described in TFA was half the point of Jack Diederich's famous rant (https://www.youtube.com/watch?v=o9pEzgHorH0); `functools.partial` is your friend.
> Maybe this says something about me more than the code but I hate to admit I was more productive in the non clean code companies.
I think you've come to draw a false dichotomy because you just haven't seen anything better. Short functions don't require complex class hierarchies to exist. They don't require classes to exist at all.
Object-oriented programming is about objects, not classes. If it were about classes, it would be called class-oriented programming.
I love this book but yes, you really need to understand when it makes sense to apply these patterns and when not to. I think of these kinds of architectural patterns like I think of project management. They both add an overhead, and both get a bad rap because if they are used indiscriminately, you will have many cases where the overhead completely dominates any value you get from applying them. However, when used judiciously they are critical to the success of the project.
For example, if I am standing up a straight-forward calendar rest api, I am not going to have a complicated architecture. However, these kinds of patterns, especially an adherence to a ports and adapters architecture, has been critical for me in building trading systems that are easy to switch between simulation and production modes seamlessly. In those cases I am really sure I will need to easily unplug simulators with real trading engines, or historical event feeds with real-time feeds, and its necessary that the business logic have not dual implementations to keep in sync.
My experience matches this. It's so liberating as well. I find it easier to internalise such code in my head compared to abstraction-soup. As you can imagine, I like golang.
Me three. I'm even happy to refactor code into a form where there's less repetition and perhaps more parametrised functions, etc.
Finding my way around a soup of ultra abstracted Matryoshka ravioli is my least favourite part of programming. Instead of simplifying things, now I need to consult 12 different objects spread over as many files before I can create a FactoryFactory.
This has been my experience in working with any kind of dogmatic structure or pattern in any language. It seems that the architecture astronauts have missed the point: making the code easier to understand for future developers without context, and provide some certainty that modifications behave as expected.
Here's an example of how things can go off the rails very quickly: Rule 1: Functions should be short (no longer than 50 lines). Rule 2: Public functions should be implemented with an interface (so they can be mocked).
Now as a developer who wants to follow the logic of the program, you have to constantly "go to definition" on function calls on interfaces, then "go to implementation" to find the behavior. This breaks your train of thought / flow state very quickly.
Now let's amp it up to another level of suck: replace the interface with a microservice API (gRPC). Now you have to tab between multiple completely different repos to follow the logic of the program. And when opening a new repo, which has its own architectural layers, you have to browse around just to find the implementation of the function you're looking for.
These aren't strawmen either... I've seen these patterns in place at multiple companies, and at this point I yearn for a 1000 line function with all of the behavior in 1 place.
Bonus architecture points if those functions behind an interface are never mocked in tests.
> Turns out all the big ones with strict architectural (n=3) pattern usage, although “clean”, the code is waaaay to complex and unnecessarily slow in tasks that at first glance should had been simple.
My last job had a Python codebase just like this. Lots of patterns, implemented by people who wanted to do things "right," and it was a big slow mess. You can't get away with nearly as much in Python (pre-JIT, anyway) as you can in a natively compiled language or a JVM language. Every layer of indirection gets executed in the interpreter every single time.
What bothers me about this book and other books that are prescriptive about application architecture is that it pushes people towards baking in all the complexity right at the start, regardless of requirements, instead of adding complexity in response to real demands. You end up implementing both the complexity you need now and the complexity you don't need. You implement the complexity you'll need in two years if the product grows, and you place that complexity on the backs of the small team you have now, at the cost of functionality you need to make the product successful.
To me, that's architectural malpractice. Even worse, it affects how the programmers on your team think. They start thinking that it's always a good idea to make code more abstract. Your code gets bloated with ghosts of dreamed-of future functionality, layers that could hypothetically support future needs if those needs emerged. A culture of "more is better" can really take off with junior programmers who are eager to do good work, and they start implementing general frameworks on top of everything they do, making the codebase progressively more complex and harder to work in. And when a need they anticipated emerges in reality, the code they wrote to prepare for it usually turns out to be a liability.
Looking back on the large codebases I've worked with, they all have had areas where demands were simple and very little complexity was needed. The ones where the developers accepted their good luck and left those parts of the codebase simple were the ones that were relatively trouble-free and could evolve to meet new demands. The ones where the developers did things "right" and made every part of the codebase equally complex were overengineered messes that struggled under their own weight.
My preferred definition of architecture is the subset of design decisions that will be costly to change in the future. It follows that a goal of good design is minimizing architecture, avoiding choices that are costly to walk back. In software, the decision to ignore a problem you don't have is very rarely an expensive decision to undo. When a problem arises, it is almost always cheaper and easier to start from scratch than to adapt a solution that was created when the problem existed only in your head. The rare exceptions to this are extremely important, and from the point of view of optics, it always looks smarter and more responsible to have solved a problem incorrectly than not to have solved it at all, but we shouldn't make the mistake of identifying our worth and responsibility solely with those exceptions.
> What bothers me about this book and other books that are prescriptive about application architecture is that it pushes people towards baking in all the complexity right at the start, regardless of requirements, instead of adding complexity in response to real demands.
The trouble is if you strictly wait until it's time then basically everything requires some level of refactoring before you can implement it.
The dream is that new features is just new code, rather than refactoring and modifying existing code. Many people are already used to this idea. If you add a new "view" in a web app, you don't have to touch any other view, nor do you have to touch the URL routing logic. I just think more people are comfortable depending on frameworks for this kind of stuff rather than implementing it themselves.
The trouble is a framework can't know about your business. If you need pluggable validation layers or something you might have to implement it yourself.
The downside, of course, is we're not always great at seeing ahead of time where the application will need to be flexible and grow. So you could build this into everything, leading to unnecessarily complicated code, or nothing, leading to constant refactors which will get worse and worse as the codebase grows.
Your approach can work if developers actually spot what's happening early and actually do what's necessary when it actually is. Unfortunately in my experience people follow by example and the frog can boil for a long time before people start to realise that their time is spent mostly doing large refactors because the code just doesn't support the kind of flexibility and extensibility they need.
> The dream is that new features is just new code, rather than refactoring and modifying existing code
I don't just mean new features. I mean new cross-cutting capabilities. I mean emitting metrics from an application that has never emitted metrics. I also mean adding new dimensions to existing capabilities, like adding support for a second storage backend to an application that has only ever supported one database.
These are changes that I was always taught were important to anticipate. If you don't plan ahead, it'll be near impossible to add later, right? After a couple of decades of working on real-life codebases, seeing the work that people pour into anticipating future needs, making things pluggable, all that stuff, seeing exactly how helpful that kind of up-front speculative work turns out to be in practice when a real need arises, and comparing it to the work required to add something to a codebase that was never prepared for it, I have become a staunch advocate for skipping almost all of it.
> Unfortunately in my experience people follow by example and the frog can boil for a long time before people start to realise that their time is spent mostly doing large refactors because the code just doesn't support the kind of flexibility and extensibility they need
If the engineers are doing large refactors, what in the world could they be doing besides adding the "kind of flexibility and extensibility they need?"
One thing to keep in mind when you compare two options is that unless the options involve different hiring strategies, the people executing them will be the same. If you have developers doing repeated large refactors without being able to make the codebase serve the current needs staring them in the face, what do you think will happen if you ask them to prepare a codebase for uncertain future needs? It's a strictly harder problem, so they will do a worse job, or at least no better.
+100.
Patterns and Abstractions have a HUGE cost in python. They can be zero cost in C++ due to compiler, or very low cost due to JVM JIT, but in Python the cost is very significant, especially once you start adding I/O ops or network calls
Some parts of this book are extremely useful, especially when it's talking about concepts that are more general than Python or any other specific language -- such as event-driven architecture, commands, CQRS etc.
That being said, I have a number issues with other parts of it, and I have seen how dangerous it can be when inexperienced developers take it as a gospel and try to implement everything at once (which is a common problem with any collection of design patterns like this.
For example, repository is a helpful pattern in general; but in many cases, including the examples in the book itself, it is a huge overkill that adds complexity with very little benefit. Even more so as they're using SQLAlchemy, which is a "repository" in its own right (or, more precisely, a relational database abstraction layer with an ORM added on top).
Similarly, service layers and unit of work are useful when you have complex applications that cover multiple complex use cases; but in a system consisting of small services with narrow responsibilities they quickly become overly bloated using this pattern. And don't even get me started with dependency injection in Python.
The essential thing about design patterns is that they're tools like any other, and the developers should understand when to use them, and even more importantly when not to use them. This book has some advice in that direction, but in my opinion it should be more prominent and placed upfront rather at the end of each chapter.
Could you explain how repository pattern is a "huge overkill that adds complexity with very little benefit"? I find it a very light-weight pattern and would recommend to always use it when database access is needed, to clearly separate concerns.
In the end, it's just making sure that all database access for a specific entity all goes through one point (the repository for that entity). Inside the repository, you can do whatever you want (run queries yourself, use ORM, etc).
A lot of the stuff written in the article under the section Repository pattern has very little to do with the pattern, and much more to do with all sorts of Python, Django, and SQLAlchemy details.
In theory it's a nice abstraction, and the benefit is clear. In practice, your repository likely ends up forwarding its arguments one-for-one to SQLAlchemy's select() or session.query().
That's aside from their particular example of SQLAlchemy sessions, which is extra weird because a Session is already a repository, more or less.
I mean, sure, there's a difference between your repository for your things and types you might consider foreign, in theory, but how theoretical are we going to get? For what actual gain? How big of an app are we talking?
You could alias Repository = Session, or define a simple protocol with stubs for some of Session's methods, just for typing, and you'd get the same amount of theoretical decoupling with no extra layer. If you want to test without a database, don't bind your models to a session. If you want to use a session anyway but still not touch the database, replace your Session's scopefunc and your tested code will never know the difference.
It's not a convincing example.
Building your repository layer over theirs, admittedly you stop the Query type from leaking out. But then you implement essentially the Query interface in little bits for use in different layers, just probably worse, and lacking twenty years of testing.
Thanks, that makes a lot of sense. I don't have a whole bunch of experience with SQLAlchemy itself. In general, I prefer not to use ORMs but just write queries and map the results into value objects. That work I would put into a Repository.
Also in my opinion it's important to decouple the database structure from the domain model in the code. One might have a Person type which is constructed by getting data from 3 tables. A Repository class could do that nicely: maybe run a join query and a separate query, combine the results together, and return the Person object. ORMs usually tightly couple with the DB schema, which might create the risk of coupling the rest of the application as well (again, I don't know how flexible SQLAlchemy is in this).
There could be some value in hiding SQLAlchemy, in case one would ever like to replace it with a better alternative. I don't have enough experience with Python to understand if that ever will be the case though.
All in all, trade-offs are always important to consider. A tiny microservice consisting of a few functions: just do whatever. A growing modulith with various evolving domains which have not been fully settled yet: put some effort into decoupling and separating concerns.
I've used SqlAlchemy in a biggish project. Had many problems, the worst ones were around session scoping and DB hitting season limits, but we had issues around the models too.
The argument for hiding SqlAlchemy is nothing to do with "what if we change the DB"; that's done approximately never, and, even if so, you have some work to do, so do it at the time. YAGNI
The argument is that SA models are funky things with lazy loading. IIRC, that's the library where the metaclasses have metaclasses! It's possible to accidentally call the DB just by accessing a property.
It can be a debugging nightmare. You can have data races. I remember shouting at the code, "I've refreshed the session you stupid @#£*"
The responsible thing to do is flatten them to, say, a pydantic DTO. Then you can chuck them about willy-nilly. Your type checker will highlight a DTO problem that an SA model would have slipped underneath your nose.
The difficulty you have following that is that, when you have nested models, you need to know in advance what fields you want so you don't overfetch. I guess you're thinking "duh, I handcraft my queries" and my goodness I see the value of that approach now. However, SA still offers benefits even if you're doing this more tightly-circumscribed fetch-then-translate approach.
This is partly how I got from the eager junior code golf attitude to my current view, which is, DO repeat yourself, copy-paste a million fields if you need, don't sweat brevity, just make a bunch of very boring data classes.
Just a heads-up if you haven't seen it: Overriding lazy-loading options at query time can help with overfetching.
Anything that gets its Authors with fetch_authors will get instances that raise instead of doing a SELECT for the books. You can throw that in a smoke test and see if there's anything sneaking a query. Or if you know you never want to lazy-load, relationship(..., lazy='raise') will stop it at the source.https://docs.sqlalchemy.org/en/20/orm/queryguide/relationshi...
Based on that, do you find SQLModel[0] to be an elegant integration of these ideas, or a horrid ball of spaghetti?
[0] https://sqlmodel.tiangolo.com/
SQLModel is supposed to be the best of both Pydantic and SQLAlchemy, but by design an SQLModel entity backed by a database table doesn't validate its fields on creation, which is the point of Pydantic.
https://github.com/fastapi/sqlmodel/issues/52#issuecomment-1...
I can't take a position without looking under the hood, but what concerns me is "SqlModel is both a pydantic model and an SA model", which makes me think it may still have the dynamic unintended-query characteristics that I'm warning about.
I seem to recall using SqlModel in a pet project and having difficulty expressing many-to-many relationships, but that's buried in some branch somewhere. I recall liking the syntax more than plain SA. I suspect the benefits of SqlModel are syntactical rather than systemic?
"Spaghetti" is an unrelated problem. My problem codebase was spaghetti, and that likely increased the problem surface, but sensible code doesn't eliminate the danger
I mean that from the point of view of YAGNI for a small app. For a big one, absolutely, you will find the places where the theoretical distinctions suddenly turn real. Decoupling your data model from your storage is a real concern and Session on its own won't give you that advantage of a real repository layer.
SQLAlchemy is flexible, though. You can map a Person from three tables if you need to. It's a data mapper, then a separate query builder on top, then a separate ORM on top of that, and then Declarative which ties them all together with an ActiveRecord-ish approach.
> I prefer not to use ORMs but just write queries and map the results into value objects. That work I would put into a Repository.
Yep, I hear ya. Maybe if they'd built on top of something lower-level like stdlib sqlite3, it wouldn't be so tempting to dismiss as YAGNI. I think my comment sounded more dismissive than I really meant.
SQLAlchemy Session is actually a unit of work (UoW), which they also build on top. By the end of the book they are using their UoW to collect and dispatch events emitted by the services. How would they have done that if they just used SQLAlchemy directly?
You might argue that they should have waited until they wanted their own UoW behaviour before actually implementing it, but that means by the time they need it they need to go and modify potentially hundreds of bits of calling code to swap out SQLAlchemy for their own wrapper. Why not just build it first? The worst that happens is it sits there being mostly redundant. There have been far worse things.
The tricks you mention for the tests might work for SQLAlchemy, but what if we're not using SQLAlchemy? The repository pattern works for everything. That's what makes it a pattern.
I understand not everyone agrees on what "repository" means. The session is a UoW (at two or three levels) and also a repository (in the sense of object-scoped persistence) and also like four other things.
I'm sort of tolerant of bits of Session leaking into things. I'd argue that its leaking pieces are the application-level things you'd implement, not versions of them from the lower layers that you need to wrap.
When users filter data and their filters go from POST submissions to some high-level Filter thing I'd pass to a repository query, what does that construct look like? Pretty much Query.filter(). When I pick how many things I want from the repository, it's Query.first() or Query.one(), or Query.filter().filter().filter().all().
Yes, it's tied to SQL, but only in a literal sense. The API would look like that no matter what, even if it wasn't. When the benefit outweighs the cost, I choose to treat it like it is the thing I should have written.
It isn't ideal or ideally correct, but it's fine, and it's simple.
You seem to have stopped reading my comment after the first sentence. I asked some specific questions about how you would do what they did if you just use SQLAlchemy as your repository/UoW.
You do that in your services.
In my experience, both SQL and real-world database schema are each complex enough beasts that to ensure everything is fetched reasonably optimally, you either need tons of entity-specific (i.e. not easily interface-able) methods for every little use case, or you need to expose some sort of builder, at which point why not just use the query builder you're almost certainly already calling underneath?
Repository patterns are fine for CRUD but don't really stretch to those endpoints where you really need the query with the two CTEs and the four joins onto a query selecting from another query based on the output of a window function.
Repository pattern is useful if you really feel like you're going to need to switch out your database layer for something else at some point in the future, but I've literally never seen this happen in my career ever. Otherwise, it's just duplicate code you have to write.
I’ve seen it, but of course there was no strict enforcement of the pattern so it was a nightmare of leakage and the change got stuck half implemented, with two databases in use.
What is the alternative that you use, how do you provide data access in a clean, separated, maintainable way?
I have seen it a lot in my career, and have used it a lot. I've never used it in any situation to switch out a database layer for something else. It seems like we have very different careers.
I also don't really see how it duplicates code. At the basic level, it's practically nothing more than putting database access code in one place rather than all over the place.
OK, let's first define some things.
What we are talking about is a "transformation" or "mapper" layer isolating your domain entities from the persistence. If this is what we call "Repository" then yes, I absolutely agree with you -- this is the right approach to this problem. But if the "Repository pattern" means a complex structure of abstract and concrete classes and inheritance trees -- as I have usually seen it implemented -- then it is usually an overkill and rarely a good idea.
Thanks. In my mind, anything about complex structures of (abstract) classes and/or inheritance trees has nothing to do with a Repository pattern.
As I understand it, Repository pattern is basically a generalization of the Data Access Object (DAO) pattern, and sometimes treated synonymously.
The way I mean it and implement it, is basically for each entity have a separate class to provide the database access. E.g. you have a Person (not complex at all, simply a value object) and a PersonRepository to get, update, and delete Person objects.
Then based on the complexity and scope of the project, Person either 1-to-1 maps to a e.g. a database table or stored object/document, or it is a somewhat more complex object in the business domain and the repository could be doing a little bit more work to fetch and construct it (e.g. perhaps some joins or more than 1 query for some data).
> for each entity have a separate class to provide the database access
Let me correct you: for each entity that needs database access. This is why I'm talking about layers here: sometimes entities are never persisted directly, but only as "parts" or "relations" of other entities; in other cases you might have a very complex persistence implementation (e.g. some entities are stored in a RDB, while others in a filesystem) and there is no clear mapping.
I recommend you to approach this from the perspective of each domain entity individually; "persistability" is essentially just another property which might or might not apply in each case.
Naturally, Repository is a pattern for data(base) access, so it should have nothing to do with objects that are not persisted. I used "entity" as meaning a persisted object. That was not very clear, sorry.
Well, again, that is not completely straightforward - what exactly is a "persisted object"? We have two things here that are usually called entities:
1. The domain entities, which are normally represented as native objects in our codebase. They have no idea whether they need to be persisted and how.
2. The database entities, which are - in RDBs at least - represented by tables.
It is not uncommon that our entities of the first type can easily be mapped 1:1 to our entities of the second type - but that is far from guaranteed. Even if this is the case, the entities will be different because of the differences between the two "worlds": for example, Python's integer type doesn't have a direct equivalent in, say, PostgreSQL (it has to be converted into smallint, integer, bigint or numeric).
In my "correction" above I was talking about the domain entities, and my phrasing that they "need database access" is not fully correct; it should have been "need to be persisted", to be pedantic.
I find repository pattern useful for testing ... it's a lot easier to mock a repository than to mock stuff that SQLalchemy might need to do.
I rarely mock a repository. Mocking the database is nice for unit-testing, it's also a lot faster than using a real DB, but the DB and DB-application interface are some of the hottest spots for bugs: using a real DB (same engine as prod) gives me a whole lot more confidence that my code actually works. It's probably the thing I'm least likely to mock out, despite making tests more difficult to write and quite a bit slowerq
TL;DR, YAGNI
I had a former boss who strongly pushed my team to use the repository pattern for a microservice. The team wanted to try it out since it was new to us and, like the other commenters are saying, it worked but we never actually needed it. So it just sat there as another layer of abstraction, more code, more tests, and nothing benefited from it.
Anecdotally, the project was stopped after nine months because it took too long. The decision to use the repository pattern wasn't the straw that broke the camel's back, but I think using patterns that were more complicated than the usecase required was at the heart of it.
Could you give me some insights what the possible alternative was that you would have rather seen?
I am either now learning that the Repository pattern is something different than what I understand it to be, or there is misunderstanding here.
I cannot understand how (basically) tucking away database access code in a repository can lead to complicated code, long development times, and the entire project failing.
Your understanding of the repository pattern is correct. It's the other people in this thread that seem to have misunderstood it and/or implemented it incorrectly. I use the repository pattern in virtually every service (when appropriate) and it's incredibly simple, easy to test and document, and easy to teach to coworkers. Because most of our services use the repository pattern, we can jump into any project we're not familiar with and immediately have the lay of the land, knowing where to go to find business logic or make modifications.
One thing to note -- you stated in another comment that the repository pattern is just for database access, but this isn't really true. You can use the repository pattern for any type of service that requires fetching data from some other location or multiple locations -- whether that's a database, another HTTP API, a plain old file system, a gRPC server, an ftp server, a message queue, an email service... whatever.
This has been hugely helpful for me as one of the things my company does is aggregate data from a lot of other APIs (whois records, stuff of that nature). Multiple times we've had to switch providers due to contract issues or because we found something better/cheaper. Being able to swap out implementations was incredibly helpful because the business logic layer and its unit tests didn't need to be touched at all.
Before I started my current role, we had been using kafka for message queues. There was a huge initiative to switch over to rabbit and it was extremely painful ripping out all the kafka stuff and replacing it with rabbit stuff and it took forever and we still have issues with how the switch was executed to this day, years later. If we'd been using the repository pattern, the switch would've been a piece of cake.
Thanks. I was starting to get pretty insecure about it. I don't actually know why in my brain it was tightly linked to only database access. It makes perfect sense to apply it to other types of data retrieval too. Thanks for the insights!
It doesn't. Use it; it's easy.
> And don't even get me started with dependency injection in Python.
Could I get you started? Or could you point me to a place to get myself started? I primarily code in Python and I've found dependency injection, by which I mean giving a function all the inputs it needs to calculate via parameters, is a principle worth designing projects around.
Here’s 1% that gives 50% of result. Replace:
with: where SupportsFoo is a Protocol. That’s it.> I have seen how dangerous it can be when inexperienced developers take it as a gospel and try to implement everything at once
This book explicitly tells you not to do this.
> Similarly, service layers and unit of work are useful when you have complex applications that cover multiple complex use cases; but in a system consisting of small services with narrow responsibilities they quickly become overly bloated using this pattern. And don't even get me started with dependency injection in Python.
I have found service layers and DI really helpful for writing functional programs. I have some complex image-processing scripts in Python that I can use as plug-ins with a distributed image processing service in Celery. Service layer and DI just takes code from:
```python
dependency.do_thing(params)
```
To:
```python
do_thing(dependency, params)
```
Which ends up being a lot more testable. I can run image processing tasks in a live deployment with all of their I/O mocked, or I can run real image processing tasks on a mocked version of Celery. This lets me test all my different functions end-to-end before I ever do a full deploy. Also using the Result type with service layer has helped me propagate relevant error information back to the web client without crashing the program, since the failure modes are all handled in their specific service layer function.
may I ask how you're mocking Celery to test?
the two main methods I've seen are to run tasks eagerly, or test the underlying function and avoid test Celery .delay/etc at all
I just rolled my own mocked Celery objects. I have mocked Groups, Chords, Chains, and Signatures, mocked Celery backend, and mocked dispatch of tasks. Everything runs eagerly because it's all just running locally in the same thread, but the workflow still runs properly-- the output of one task is fed into the next task, the tasks are updated, etc.
I actually pass Celery and the functions like `signature`, `chain`, etc as a tuple into my service layer functions.
It's mostly just to test that the piping of the workflow is set up correctly so I don't find out that my args are swapped later during integration tests.
This was my takeaway too. It’s interesting to see the patterns. It would be helpful for some guidance upfront around when the situations in which they are most useful to implement. If a pattern is a tool, then steering me towards when it’s used or best avoided would be helpful. I do appreciate that the pros and cons sections get to this point, so perhaps it’s just ordering and emphasis.
That said, having built a small web app to enable a new business, and learning python along the way to get there, this provided me with some ideas for patterns I could implement to simplify things (but others I think I’ll avoid).
> That being said, I have a number issues with other parts of it, and I have seen how dangerous it can be when inexperienced developers take it as a gospel and try to implement everything at once (which is a common problem with any collection of design patterns like this.
Robert Martin is one of those examples, he did billions in damages by brainwashing inexperienced developers with his gaslighting garbage like "Clean Code".
Software engineering is not a hard science so there is almost never a silver bullet, everything is trade-offs, so people that claim to know the one true way are subcriminal psychopaths or noobs
Clean code has lots of useful tips and techniques.
When people are criticizing it they pick a concept from one or two pages out the hundreds and use it to dismiss the whole book. This is a worse mistake than introducing concepts that may be foot guns in some situations.
Becoming an experienced engineer is learning how, when and where to apply tools from your toolkit.
I’m a Typescript dev but this book is one of my favorite architecture books, I reference it all the time. My favorite pattern is the fake unit of work/service patterns for testing, I use this religiously in all my projects for faking (not mocking!!) third party services. It also helped me with dilemmas around naming, eg it recommends naming events in a very domain specific way rather than infrastructure or pattern specific way (eg CART_ITEM_BECAME_UNAVAILABLE is better than USER_NOTIFICATION). Some of these things are obvious but tedious to explain to teammates, so the fact that cosmic python is fully online makes it really easy to link to. Overall, a fantastic and formative resource for me!
I haven't seen this book before, but I noticed that one of the authors, Harry J. W Percival, is the author of the TDD "goat" book.
https://www.obeythetestinggoat.com/pages/book.html
That book is in a similar place in my heart, I barely used Python in my professional life, yet it's a book I often come back to even if I'm using a different language. It's also great that book is available both online and in paper form.
I'll definitely give this book a chance!
I saw that a new, updated version of that book will be released this year.
> faking (not mocking!!)
You might like this: https://martinfowler.com/bliki/TestDouble.html
Fakes over mocks every time
I see Python at a nice glue language.
I grew tired from the forced OOP mindset, where you have to enforce encapsulation and inheritance on everything, where you only have private fields which are set through methods.
I grew tired of SOLID, clean coding, clean architecture, GoF patterns and Uncle Bob.
I grew tired of the Kingdom of Nouns and of FizzBuzz Enterprise Editions.
I now follow imperative or functional flows with least OOP as possible.
In the rare cases I use Python (not because I don't want to, but because I mainly use .NET at work) I want the experience to be free of objects and patterns.
I am not trying to say that this book doesn't have a value. It does. It's useful to learn some patterns. But don't try to fit everything in real life programming. Don't make everything about patterns, objects and SOLID.
my favourite model is to write as many pure functions as possible, and then as many functions of 1-4 parameters that interact with the outside world, and only then create domain objects to wrap those - it keeps the unrelated complexity out of the domain and then I can also reuse those functions without having to create the entire object that I don't always need.
I am not convinced that domain driven design works. Objects doesn't model the real world well. Why we should think DDD model the real world or a business well? And why do we even need to model something?
Computers are different than humans.
I think we should be pragmatic and come with the best solution in terms of money/time/complexity. Not trying to mimick human thought using computers.
After all a truck isn't mimicking horse and carriage. A plane isn't mimicking a bird.
> I am not convinced that domain driven design works. Objects doesn't model the real world well.
DDD does not necessitate OOP. You can do DDD in functional languages. I think there's a whole F# book about it. So I think you can conclude that OOP doesn't model the world well, which may or may not be true, but I think it's not valid to extend the conclusion to DDD. Is there a DDD-inherent reason why you think DDD does not work?
> And why do we even need to model something?
Well I suppose it depends on your definition of "model", but aren't you by necessity modeling something when you write software? Somewhere in your software there is a user thingy/record/object/type/dictionary/struct which is an imperfect and incomplete representation of a real life person. I.e., a model.
The difference is that modelling it through software gives you feedback, as you can run the software. You can't run the DDD documents.
Forgive my ignorance on DDD, but what documents are you referring to?
DDD isn't about objects. It's just about modelling the domain (real world) using the tools available to you. Some things are best modelled by objects, some are best modelled by functions or other constructs.
The real point is establish a common language to talk about the domain. This is enormously powerful. Have you ever worked with people who don't speak your language? Everything takes 3x as long as ideas aren't communicated properly and things get lost in translation.
It's the same with code. If your code is all written in the language of the computer then you'll be translating business language into computer language. This takes longer and it's more error prone. The business people can't check your work because they don't know how to code and the coders can't check because they don't know the business.
The point of building abstractions is it empowers you to write code in a language that is much closer to the language of the domain experts. DDD is one take but the basic idea goes all the way back to things like SICP.
Good coders learn enough about the business to check the code. Domain experts can look at the running software.
With DDD you just get a third model that is neither the domain, nor the software, and both the domain experts and the programmers will have to work extra to maintain and understand it.
Worse, people often try to build this model up front, which means it will be wrong, hard to implement and probably get thrown away if you actually want to ship anything
I think people are getting triggered by the word domain, and conflating it with a particular cargo cult called DDD. It's the same with agile - the one that's implemented is usually the cargo cult version that charges you the cost of the "official" process without the benefits.
I meant domain modelling in the simplest sense: I created three objects, the market, the trading strategy and a simulated version of the market. that's it. no paperwork, no forms filled in triplicate.
IME mention of DDD has been a pretty solid predictor of "this project will produce nothing but documents for months and if it gets to the implementation stage complexity will spiral out of control. It probably won't ship."
It's possible that it works for some people, but I've seen the above scenario play out too many times to see DDD as anything but a red flag
At its core, objects are just containers for properties, and exploiting that fact leads to easily understood systems than the one without.
For example, at work I'm currently refactoring a test for parsing the CSV output of a system; as it stands it depends on hardcoded array indexes, which makes the thing a mess. Defining a few dataclasses here and there to model each entry of the CSV file, and then writing the test with said objects has made the test much more pleasant and easily understood.
This is how I treat it as well by now. I'm not even certain if it's object oriented, or not or hybrid, or not, or whatever.
But in a small project at work that's mostly about orchestrating infrastructure components around a big central configuration it's just that. You define some dataclasses, or in our case Pydantic models to parse APIs or JSON configurations because then you can use type-aheads based off of types and it's more readable.
And then, if you notice you end up having a lot of "compute_some_thing(some_object)", you just make it an @property on that object, so you can use "some_object.some_thing". If you often move attributes of some foo-object to create some bar-object, why not introduce an @classmethod Foo.from_bar or @property Bar.as_foo? This also makes the relationship between Foo and Bar more explicit.
OOP doesn't have to be the horrible case some legacy Java Enterprise Projects make it to be. It can also be a bunch of procedural code with a few quaint objects in between, encapsulating closely related ideas of the domain.
the model is for me. I wrote a crude btc trading bot, and I had three major objects: Market(), TradingStrategy and SimulatedMarket which used historical data.
I did not model individual transactions, fees, etc as objects. The idea was that I encapsulated all the stuff that's relevant to a strategy in one object, and the market object would give me things like get_price(), buy(), sell(), which would also be available in the simulated version.
And if you can encapsulate the 3 different domains well, if you switch brokers you should have any external dependencies. Make your functions/domains deep.
Pure functions are also way more testable
Hmm let's see we're going to
- Reimplement SQLAlchemy models (we'll call it a "repository")
- Reimplement SQLAlchemy sessions ("unit of work")
- Add a "service layer" that doesn't even use the models -- we unroll all the model attributes into separate function parameters because that's less coupled somehow
- Scatter everything across a message bus to remove any hope of debugging it
- AND THIS IS JUST FOR WRITES!
- For reads, we have a separate fucking denormalized table that we query using raw SQL. (Seriously, see Chapter 12)
Hey, let's see how much traffic MADE.com serves. 500k total visits from desktop + mobile last month works out to... 12 views per MINUTE.
Gee, I wish my job was cushy enough that I could spend all day writing about "DDD" with my thumb up my ass.
I've made it through about 75% of the book and have never gotten the sense that they think everything discussed in the book is something you should always do. Each pattern discussed has a summary of pros and cons. While they may be a bit lacking, they clearly articulate the fact that you should be thinking whether or not the pattern matches the application's needs.
I don't think there's many applications that will require everything in the book but there are certaintly many applications that could apply one or more patterns discussed.
OK so show us how to write software for a complex business properly. Oh, I see, it's a throwaway account. This is just drive-by negativity with zero value.
Yikes
I started writing python professionally a few years ago. Coming from Kotlin and TypeScript, I found the language approachable but I was struggling to build things in an idiomatic fashion that achieved the loose coupling and testability that I was used to. I bought this book after a colleague recommended it and read it cover to cover. It really helped me get my head around ways to manage complexity in non trivial Python codebases. I don’t follow every pattern it recommends, but it opened my eyes to what’s possible and how to apply my experience in other paradigms to Python without it becoming “Java guy does Python”.
I cannot recommend it enough. Worth every penny.
Truly one of the great python programming books. The one thing that I found missing was the lack of static typing in the code, but that was a deliberate decision by the authors.
Haven’t read the book, so I don’t know exactly what position they’re taking there, but type checking has done more to improve my Python than any amount of architectural advice. How hard it is to type hint your code is a very good gauge of how hard it will be to understand it later.
My experience is that once people have static typing to lean on they focus much less on the things that in my view are more crucial to building clean, readable code: good, consistent naming and small chunks.
Just the visual clutter of adding type annotations can make the code flow less immediately clear and then due to broken windows syndrome people naturally care less and less about visual clarity.
So far off from what actually happens. The type annotations provide an easy scaffolding for understand what the code does in detail when reading making code flow and logic less ambiguous. Reading Python functions in isolation, you might not even know what data/structure you’re getting as input… if there’s something that muddles up immediate clarity it’s ambiguity about what data code is operating on.
>So far off from what actually happens
I disagree strongly, based on 20 years of using Python without annotations and ~5 years of seeing people ask questions about how to do advanced things with types. And based on reading Python code, and comparing that to how I feel when reading code in any manifest-typed language.
>Reading Python functions in isolation, you might not even know what data/structure you’re getting as input
I'm concerned with what capabilities the input offers, not the name given to one particular implementation of that set of capabilities. If I have to think about it in any more detail than "`ducks` is an iterable of Ducklike" (n.b.: a code definition for an ABC need not actually exist; it would be dead code that just complicates method resolution) I'm trying to do too much in that function. If I have to care about whether the iterable is a list or a string (given that length-1 strings satisfy the ABC), I'm either trying to do the wrong thing or using the wrong language.
> if there’s something that muddles up immediate clarity it’s ambiguity about what data code is operating on.
There is no ambiguity. There is just disregard for things that don't actually matter, and designing to make sure that they indeed don't matter.
>I'm concerned with what capabilities the input offers, not the name given to one particular implementation of that set of capabilities. If I have to think about it in any more detail than "`ducks` is an iterable of Ducklike" (n.b.: a code definition for an ABC need not actually exist; it would be dead code that just complicates method resolution) I'm trying to do too much in that function. If I have to care about whether the iterable is a list or a string (given that length-1 strings satisfy the ABC), I'm either trying to do the wrong thing or using the wrong language.
You can specify exactly that and no more, using the type system:
If you are typing it as list[Duck] you're doing it wrong.I understand that. The point is that I gain no information from it, and would need more complex typing to gain information.
I keep seeing people trying to wrap their heads around various tricky covariance-vs-contravariance things (I personally can never remember which is which), or trying to make the types check for things that just seem blatantly unreasonable to me. And it takes up a lot of discussion space in my circles, because two or more people will try to figure it out together.
>The point is that I gain no information from it
No, you do gain information from it: that the function takes an Iterable[Ducklike].
Moreover, now you can tell this just from the signature, rather than needing to discover it yourself by reading the function body (and maybe the bodies of the functions it calls, and so on ...). Being able to reason about a function without reading its implementation is a straightforward win.
>No, you do gain information from it: that the function takes an Iterable[Ducklike].
I already had that information. I understand my own coding style.
>Being able to reason about a function without reading its implementation is a straightforward win.
My function bodies are generally only a few lines, but my reasoning here is based on the choice of identifier name.
Yes, it takes discipline, but it's the same kind of discipline as adding type annotations. And I find it much less obnoxious to input and read.
>I already had that information. I understand my own coding style.
Good for you, but you're not the only person working on the codebase, surely.
>My function bodies are generally only a few lines, but my reasoning here is based on the choice of identifier name.
Your short functions still call other functions which call other functions which call other functions. The type will not always be obvious from looking at the current function body; often all a function does with an argument is forward it along untouched to another function. You often still need to jump through many layers of the call graph to figure out how something actually gets used.
An identifier name can't be as expressive as a type without sacrificing concision, and can't be checked mechanically. Why not be precise, why not offload some mental work onto the computer?
>Yes, it takes discipline, but it's the same kind of discipline as adding type annotations.
No, see, this is an absolutely crucial point of disagreement:
Adding type annotations is not "discipline"!
Or at least, not the same kind of discipline as remembering the types myself and running the type checker in my head. The type checker is good because it relieves me of the necessity of discipline, at least wrt to types.
Discipline consumes scarce mental effort. It doesn't scale as project complexity grows, as organizations grow, and as time passes. I would rather spend my limited mental effort on higher level things; making sure types match is rote clerical work, entirely suitable to a machine.
The language of "discipline" paints any mistake as a personal/moral failure of an individual. It's the language of a blame-culture.
> Good for you, but you're not the only person working on the codebase, surely.
I actually am. But I've also read plenty of non-annotated Python code from strangers without issue. Including the standard library, random GitHub projects I gave a PR to fix some unidiomatic expression (defense in depth by avoiding `eval` for example), etc. When the code of others is type-annotated, I often find it just as distracting as all the "# noqa: whatever" noise not designed to be read by humans.
And long functions are vastly more mentally taxing.
> often all a function does with an argument is forward it along untouched to another function. You often still need to jump through many layers of the call graph to figure out how something actually gets used.
Yes, and I find from many years of personal experience that this doesn't cause a problem. I don't need to "figure out how something actually gets used" in order to understand the code. That's the point of organizing it this way. This is also one of the core lessons of SICP as I understood it. The dynamic typing of LISP is not an accident.
> An identifier name can't be as expressive as a type without sacrificing concision
On the contrary: it is not restricted to referring to abstractions that were explicitly defined elsewhere.
> Why not be precise, why not offload some mental work onto the computer?
When I have tried to do it, I have found that the mental work increased.
> No, see, this is an absolutely crucial point of disagreement
It is.
In my experience, arguing co versus contra is a sign you are working with dubious design decisions in the first place. Mostly this is where I punch a hole in the type system and use Any. That way I can find all the bad architectural decisions in the codebase using grep.
> using the wrong language
IMO this is the source of much of the demand for type hints in Python. People don't want to write idiomatic Python, they want to write Java - but they're stuck using Python because of library availability or an existing Python codebase.
So, they write Java-style code in Python. Most of the time this means heavy use of type hints and an overuse of class hierarchies (e.g. introducing abstract classes just to satisfy the type checker) - which in my experience leads to code that's twice as long as it should be. But recently I heard more extreme advice - someone recommended "write every function as a member of a class" and "put every class in its own file".
I’d say I use type hints to write Python that looks more like Ocaml. Class hierarchies shallow to nonexistent. Abundant use of sum types. Whenever possible using Sequence, Mapping, and Set rather than list, dict, or set. (As these interfaces don’t include mutation, even if the collection itself is mutable.) Honestly if you’re heavily invested in object oriented modeling in Python, you’re doing it wrong. What a headache.
This is totally not how I used typed Python. I eschew classes almost entirely, save for immutable dataclasses. I don't use inheritance at all. Most of the code is freestanding pure functions.
I can remember in the mid-00s especially, Python gurus were really fond of saying "Python is not Java". But `unittest` was "inspired by" JUnit and `logging` looks an awful lot like my mental image of Log4J of the time.
> But recently I heard more extreme advice - someone recommended "write every function as a member of a class" and "put every class in its own file".
Good heavens.
Not coincidentally, these are two of my least favorite parts of the standard library. Logging especially makes me grumpy, with its hidden global state and weird action at a distance. It’s far too easy to use logging wrong. And unittest just feels like every other unit testing framework from that era, which is to say, vastly overcomplicated for what it does.
Exactly my experience. I call Python a surprise-typed language. You might write a function assuming its input is a list, but then somebody passes it a string, you can iterate over it so the function returns something, but not what you would have expected, and things get deeply weird somewhere else in your codebase as a result. Surprise!
Type checking on the other hand makes duck typing awesome. All the flexibility, none of the surprises.
This is because of Python's special handling of iteration and subscripting for strings (so as to avoid having a separate character type), not because of the duck typing. In ordinary circumstances (e.g. unless you need to be careful about a base case for recursion - but that would cause a local fault and not "deep weirdness at a distance"), the result is completely intuitive (e.g. you ask it to add each element of a sequence to some other container, and it does exactly that), and I've written code that used these properties very intentionally.
If you passed a string expecting it to be treated as an atomic value rather than as a sequence (i.e. you made a mistake and want a type checker to catch it for you), there are many other things you can do to avoid creating that expectation in the first place.
Type annotations are just like documentation though. Just because the annotation says int the function can still return a list.
Annotations can and should be checked. If I change a parameter type, other code using the function will now show errors. That won't happen with just documentation.
> Annotations can and should be checked
Unfortunately Python’s type system is unsound. It’s possible to pass all the checks and yet still have a function annotated `int` that returns a `list`.
True and irrelevant. Type annotations catch whole swathes of errors before they cause trouble and they nudge me into writing clearer code. I know they’re not watertight. Sometimes the type checker just can’t deal with a type and I have to use Any or a cast. Still better than not using them.
Do you mean that you're allowed to only use types where you want to, which means maybe the type checker can't check in cases where you haven't hinted enough, or is there some problem with the type system itself?
The type system itself is unsound. For example, this code passes `mypy --strict`, but prints `<class 'list'>` even though `bar` is annotated to return an `int`:
So don't do this then? The type system does not have to be sound to be useful; Typescript proves this.
> So don't do this then?
Don't do what?
- Don't write unsound code? There's no way to know until you run the program and find out your `int` is actually a `list`.
- Don't assume type annotations are correct? Then what's the point of all the extra code to appease the type checker if it doesn't provide any guarantees?
Don't do this stupid party trick with `global`.
You may as well argue that unit tests are pointless because you could cheat by making the implementations return just the hardcoded values from the test cases.
This isn't a "party trick" with `global`, it's a fundamental hole in the type system:
This seems to work in TypeScript too:
It seems to be a problem with "naked" type unions in general. It's unfortunate.In some cases don't you need to actually execute the code to know what the type actually is. How does the type checker know then?
It doesn't. There are cases where the type-checker can't know the type (e.g. json.load has to return Any), but there are tools in the language to reduce how much that happens. If you commit to a fully strictly-typed codebase, it doesn't happen often.
You can actually annotate the return type of json.load better than that:
Yeah, people from statically typed languages sometimes can't understand how dynamically typed languages can even work. How can I do anything if I don't know what type to pass?! Because we write functions like "factorial(number)" instead of "int fac(int n)".
I wish that's how python functions were written. What i usually see is `draw(**kwargs)`.
Yup! I'm also hopeful that the upcoming type-checker from Astral will be an improvement over Mypy. I've found that Mypy's error messages are sometimes hard to reason about.
[0]: https://x.com/charliermarsh/status/1884651482009477368
> The one thing that I found missing was the lack of static typing in the code
It has type hints, such as here: https://www.cosmicpython.com/book/chapter_08_events_and_mess...
Do you mean it's not strict enough? There are some parts of the book without them.
Some examples use dataclasses, which force type annotations.
Python does not support static typing. Tooling based on type annotations doesn't affect the compilation process (unless you use metaprogramming, like dataclasses do) and cannot force Python to reject the code; it only offers diagnostics.
I have this on my shelf. It's a small volume, similar to K&R, and like that book mine is showing visible signs of wear as I've thumbed through it a lot.
Oh neat, I read the paper back of this book maybe two and a half or three years or so ago. I enjoyed it quite a bit. They do a good job at keeping tests a first class topic and consistently updating them with each addition. Some older architecture books don't treat testing as being as high up in their priorities. I've just found that having tests ready, easy to write, and easy to update, makes the development process more enjoyable for me since it's less manual for running the code to check for issue - tighter feedback look I guess.
I will say that some of the event oriented parts of this book were very interesting, but didn't seem as practical to implement in my current work.
This was a great read and summary! About three years ago, I worked in a C#/.NET DDD environment, and now revisiting these concepts in Python really distills the essential parts. As I said, great read — highly recommend it if you're also into this kind of stuff.
I don’t understand the need for most of the patterns described in this book. Why abstract away SQLalchemy using a Repository when it is already an abstraction over the database? What’s the purpose of the unit of work? To me hand rolling SQL is much more maintainable than this abstraction soup
Even though most people might think of web architectures when it comes to this book, we used this to design an architecture for an AI that optimises energy efficiency in a manufacturing factory.
Great book!
Is it easy to transpose to other types of architectures, or is it leaning heavily against web development? Thank you for sharing, your project sounds really interesting by the way! :)
One of the key points of the book (and DDD in general) is the web stuff is just a detail at the edge of an application. You should be able to replace the web bit (for which they use flask) with any other entry point. In fact, they do this by having an event subscriber entry point and IIRC a CLI entry point. The whole point is it all uses the same core code implementing the domain logic.
I would have expected the book mentioning something about the concept of DTOs at some point. What could be the reason it doesn't?
Excellent sequel to the goat book (TDD with Python), that got me to deploy my first real web application.
TDD is dysfunctional crap pushed by the lying scammer Robert Martin on inexperienced devs
No argument on that from me in general, but the book in question is practical.
What's dysfunctional about it?
No mentioning of of https://polylith.gitbook.io/polylith? Is it related at all?
“ If you’re reading this book, we probably don’t need to convince you that Python is great”
I actually do. It’s slow, buggy and not type safe.
Everything good about Python is actually C, namely the good packages. They’re not written in Python, because Python is shit.
Great source for understanding how DDD works in a larger context. I love how concrete it is
Great stuff, thank you for sharing
[dead]
Unfortunately https://www.cosmicpython.com/book/ does give a 404 - this is a very bad architectural choice for web applications. I hope their other tips are better.
https://www.cosmicpython.com/book/preface.html