Why does everyone run ancient Postgres versions?

258 points by davidgomes 4 days ago

Because the actual process of upgrading Postgres is terrible.

I say this as someone who absolutely loves using it, but the actual process of upgrading Postgres is something that takes significant downtime, is error-prone and you're often better off just dumping all the databases and reimporting them in a new folder. (A good idea in general since it'll vacuum and compact indexes as well if you do it, combining a few maintenance routines in your upgrade path.)

It requires having the previous version of Postgres installed, something which can mess with a number of distro policies (not to mention docker, which is the most popular way to deploy software that will typically rely on Postgres), and unlike most software with that issue, Postgres is software you want to be managed by your distro.

Therefore, most people only upgrade by necessity - when their distro forces the upgrade or the version they're using reaches EOL.

kardianos 3 days ago

There are two primary issues with PostgreSQL:
* Upgrades
* Database Protocol
MS SQL Server does better on both of these fronts: Upgrades, when running MS SQL Server in a container, is rebooting the database with the new database engine version. The DBMS sees the old files, upgrades them in place and starts the DB completely seamlessly 100% of the time.
The MS SQL Server protocol, TDS, supports native named parameters, executes the T-SQL PL inline with SQL. TDS also supports inline query cancellations because the protocol is framed, and both T-SQL and TDS supports returning multiple result sets from a single query.
But yes, whenever I see PG docs saying, major versions require an export and re-load, I shudder.
- Tostino 3 days ago
  
  As someone who has a decade of professional postgres experience (and built my company on it), and 6 years of SqlServer experience, I couldn't agree more with your comment. I really, really like the TDS wire protocol and what it enables.
  I also really miss their query optimizer. It was just so damn smart.
  
  chasil 3 days ago
  
  I've only done it once, but doing an "rpm -Uvh" on the SQL Server binaries in Linux, then starting up the database did an immediate upgrade of the database files.
  Oracle's DBUA is so much more painful.
  
  whartung 3 days ago
  
  I'm sorry, not to quibble about the quality of the TDS protocol, but what does it have to do with the upgradability of SqlServer?
  
  Tostino 3 days ago
  
  Nothing, they're unrelated.
  However, I agree with OP that SQL Server does have a solid upgrade story and it's generally pain free.
  They are just two aspects I really wish Postgres could do better on, and I miss working with those parts of SQL Server.
beeboobaa3 3 days ago

This has been solved by logical replication a few versions ago.
1. Leave your old database running
2. Start your new database with new version
3. Run logical replication and let it catch up
4. Decide you want to switch over.
5. Using a script: Stop your apps writing to the database (or switch to read only mode), let logical replication catch up one final time, stop logical replication, run a procedure to fix your sequences, and switch your apps over to the new database.
You are now done. Sure, it requires a one-time investment to work out the exact steps needed for your situation, but it's easy to do with minimal downtime.
Of course if you don't care about downtime just shut down your database server, make a backup(!), and then do pg_upgrade.
- ellisv 3 days ago
  
  > Of course if you don't care about downtime just shut down your database server, make a backup(!), and then do pg_upgrade.
  I think this is the right answer for 99% of companies. Yes, everyone wants zero downtime, but I see incredibly few cases where it is actually worth the cost and effort.
- RegnisGnaw 3 days ago
  
  This has not been solved by logical replication. There are features in PostgreSQL not supported by logical replication. One of the issues I'm dealing with is our product switched from Oracle to PostgreSQL and the ORM stuff love LOBs, and you know LOBs can't be logical replicated.
CrimsonRain 3 days ago

I fully agree. That's why I use this: https://github.com/pgautoupgrade/docker-pgautoupgrade
But obviously this is not suitable for all use cases. I don't know why pg doesn't invest in this sector.
- elcritch 3 days ago
  
  Financial incentives likely, because PG is developed by companies that make money via providing support. Doesn't have to be an insidious plot but just why work on something you know well and that makes you money. MSSQL wants people to pay to upgrade, it behooves them to make it seemless.
- DavidPP 3 days ago
  
  Thanks for the link, wasn't aware of this solution.
jeltz 3 days ago

While I partially agree the solution is pretty simple to the multiple versions problem: use the official RPM and APT repos.
https://apt.postgresql.org/
https://yum.postgresql.org/
forinti 3 days ago

I have always compiled from source so that I can have more than one version in place. The installation takes less than 1GB and is quite easy and fast to compile and I've never had any issues with upgrades. If you forget to compile an extension, you can do it later, copy the .so to the right folder, and you don't even have to restart the cluster.
If you don't want to use pg_upgrade, you can dump from one cluster and pipe directly into another, with no need for a temporary file. It couldn't be easier.
Can't say the same for Oracle. Even applying a patch can ruin several weekends.
- literalAardvark 3 days ago
  
  > It couldn't be easier
  If you have a trivial database, yes.
  If you have a large, complicated database that takes hours/days to restore and has an HA SLA, very hell no.
  
  forinti 3 days ago
  
  That's not going to be easy with any database.
  
  danudey 3 days ago
  
  MySQL, for all its faults, actually handles this quite well. Generally speaking, assuming there aren't any major backwards-incompatibilities (which there haven't often been in my experience), you can start up the old server in place of the new server and you're good to go. Some table-rebuilding can be done in-place if you need to do things like recreate indices, but _generally_ you're good to go from there.
  For more complex deployments, cases of backwards-incompatibilities in the data structure, or where I had reason to rebuild many large tables, you set up a replication secondary, do pre-upgrade migrations (e.g. in the case of deprecated functionality), upgrade the secondary, do post-upgrade migrations (e.g. in the case of new functionality), and wait for replication to catch up 100%. Then, after sending traffic to it for a bit to ensure it works correctly, you swap the primary and secondary and do it again.
  
  Too 3 days ago
  
  With mongodb in a HA setup you can perform a staggered upgrade of individual nodes in the cluster with no downtime. Very smooth in my experience.
  
  water9 2 days ago
  
  Yeah, but what is your dataset is partitioned horizontally? It’s very smooth until it’s not. For example, if you are using the postGIS extensions and something changes with the new version that could be a showstopper
  
  magicalhippo 2 days ago
  
  We have customers with not quite TB levels, but at least several hundreds GB databases.
  Upgrading the database server is as simple as shutting down the service, installing new executables and restarting the service. Downtime is measured in minutes.
  When upgrading major versions, one might not get full functionality of the new features unless one upgrades the on-disk format (unload-reload), but you can run the new server version with older on-disk format just fine.
- icedchai 3 days ago
  
  I worked at a place that was running a 8 year old install of Oracle 8i because upgrading was just too painful. Oracle 10g was out by that point. It's been over 15 years since... I wonder if they ever upgraded?
ktosobcy 3 days ago

This! I already mentioned this a while back and was basically berated (by someone that seemed like a dev/close to dev) that current setup is just fine and because of postgress complexity (extensions) it has to be done that way... and while I like postgress a lot it's quite annoying that the upgrade is such a terrible experience... :|
- lazide 2 days ago
  
  I suspect it’s also a bit of stockhold syndrome on behalf of admins haha
  
  ktosobcy 2 days ago
  
  Or better yet - "guarding the job", i.e. making their job very important ;)
  
  lazide 2 days ago
  
  Very true. At a prior company, we rolled out a simpler and easier to use admin console for our enterprise product and all the customer admins hated it.
  Because now it was easy to use hah.
  We ended up rolling it back because the hate was so consistent.
  (And no, this wasn’t us guessing or making this up - a number of them were quite explicit that we were threatening their roles, and to stop it.)
anymouse123456 3 days ago

Also a fan of Postgresql, but compiling and installing this tool is also a complete nightmare.
Every bit of the experience before getting it working, feels like it was designed by an MBA who owns a Postgres hosting service.
Other than that, it's amazing.
thomaslord 3 days ago

I'll confess - I have a project that uses Heroku's managed Postgres and my preferred upgrade method is to set the maintenance window to the middle of the night, create a backup, and be awake at 1am to make sure that nothing is broken after they force the upgrade. Their auto-upgrade process hasn't failed me so far, but there's no way to manually trigger it.
- daleswanson 2 days ago
  
  What do you mean? You can manually run Heroku database maintenance. We just did this recently at work.
  https://devcenter.heroku.com/articles/heroku-postgres-mainte...
crabbone 3 days ago

> Postgres is software you want to be managed by your distro.
Why?
For production systems I generally try to avoid using anything not built internally. When it comes to PostgreSQL, for example, if you want to submit benchmarks, or you want to report bugs, an important or sometimes a mandatory component is to provide compilation flags. Also, seeing how every kind of storage can act in a different way, tuning of higher-level storage programs seems inevitable. Even though PostgreSQL has a lot of configurations that can be done to the compiled program, some (eg. block size) are compile-time that can only be changed by compiling the program.
- digitallis42 3 days ago
  
  Depends on your scale. If you're a startup or even just a small side service, performance isn't going to be a bottleneck and you/the org wants the thing to be fire and forget, including bug fixes and especially security patches. A distro takes care of all of those generally and makes sure the dependencies have the same care taken.
  
  throw0101c 3 days ago
  
  > A distro takes care of all of those generally and makes sure the dependencies have the same care taken.
  So does using the vendor repos:
  * https://apt.postgresql.org/ / https://yum.postgresql.org
  * https://dev.mysql.com/downloads/repo/
  This way when you upgrade your OS you don't have to worry about suddenly getting a new version (which, in the case of MySQL, may take a long time to convert its database files to the new version format).
  
  necovek 3 days ago
  
  > This way when you upgrade your OS you don't have to worry about suddenly getting a new version (which, in the case of MySQL, may take a long time to convert its database files to the new version format).
  At least on Ubuntu (and likely Debian), your existing Postgres version is never dropped until you are ready to manually upgrade yourself.
  
  RealStickman_ 3 days ago
  
  Using third party repos makes upgrading the OS in general more difficult though and is more dangerous than simply using what your distro provides.
  
  jamespo 3 days ago
  
  How does that work for python/perl/ruby libs etc?
  
  danudey 3 days ago
  
  It's very rare that the client protocol changes dramatically and in an incompatible way, so typically nothing changes and no one notices.
  I've seen a few rare cases where an update to the client library is required for improvements to the protocol, e.g. when MySQL switched to the new password/authentication format, but that was also backwards-compatible: the old format was used until an account's password was changed, password changes could be told to use the legacy format instead, and you could set the legacy format by default. The only issue that occurred (and it did occur) was old client libraries + new server password hashes, but, again, in this case it didn't affect existing installations/users/passwords.
  
  crabbone 2 days ago
  
  > This way when you upgrade your OS
  Don't forget to do this on Friday afternoon! :D
  Upgrading OS in production environment... this reminds me of a joke about police academy intake test:
  The participants were offered a board with a round and a square hole and two pegs of similar shape. After the test the new recruits were sorted into two groups: very smart and very strong
  So, yeah, you ought to be either very smart or very strong to upgrade OS in production environment, and then also discover changes to your database as you go along.
  That's not to say that upgrades don't happen at all... but you'd be upgrading in a testing environment ten times before you try that "for real", and at that point you would have probably solved the question of whether you need to (re)install the database and how :)
- fossdd 3 days ago
  
  > if you want to submit benchmarks, or you want to report bugs, an important or sometimes a mandatory component is to provide compilation flags
  On all open-source distros, you can look into your distro's source and check their compile flags.
  
  crabbone 2 days ago
  
  The idea here is more that you need to adjust these flags for the benchmark to make sense.
  Take fore example the aforementioned block size. Say, you have a storage with 4x block size of the one used by PostgreSQL. If you run a benchmark over such a storage, you'll have insane write amplification. Nobody will consider results of such a "benchmark" because that's simply a misconfiguration of the program you are trying to measure.
  More generally, Linux distros will tend to compile distributed binaries with "safe" defaults s.t. run on most h/w users can have, and this means optimizing for the lowest common denominator. Looping back to PostgreSQL, the default for block size was for a long time 4k, and iirc today it's 8k. This is fine, if we are talking about plain SSD / HDD, but with enterprise SDS, these are "rookie numbers", even the maximum supported by PosgreSQL (32k) is still a "rookie number", but it's still four times better than the default!
- king5505 3 days ago
  
  [flagged]
tgv 3 days ago

ADd to that: if it works, don't fix it.
- chefandy 3 days ago
  
  And considering how many people use some sort of abstraction layer like an ORM, have databases installed as part of some deployed application, or just do basic CRUD stuff, it probably would have worked on a postgres install from the 90s.
  I keep an eye out for vulnerabilities, and that's about it.
MichaelZuo 3 days ago

So the real question is, why is the upgrade process so incompetently designed, and why has no one fixed this?
- phil21 3 days ago
  
  My opinion is Postgres was designed by software developers for software developers. The split on “which relational database to use” in my career has almost always been perfectly split between SWE vehemently demanding pgsql for the feature set, and the sysadmins having to support maintenance and production availability preferring MySQL.
  One of the few things I’ve enjoyed with the move into devops and companies forcing previously “pure” developers into operational roles was their discovery that Postgres was utterly horrible to administer at a systems level. Apparently us lowly sysadmins may have had a point after all.
  This is a bit tongue in cheek but really not far from my lived reality. When the focus is on features and “correctness” at the near total expense of sane systems tooling folks can develop some myopia on the subject. So many arguments with devs on my teams over this subject that were utterly horrified to find we were running MySQL for a given service.
  Open source projects tend to fix the pain points its contributors experience, and I assume there were not too many contributors wanting to deal with the boring work of making administration and easy task - it’s thankless “sideways” work that won’t result in many accolades or personal satisfaction for most SWEs.
  The end users are almost always developers, most of whose experiences in production entail either the equivalent of a docker container level scale system, or are simply given a connection string and the rest is a black box to them. Under those contexts I’d personally prefer Postgres as well and it wouldn’t even be close. When you get into backups, clustering, upgrades, and high availability under extreme load? IMO the story falls apart real fast.
  
  jeltz 3 days ago
  
  That has not been my experience at all. The sysadmins I have worked with have strongly preferred PostgreSQL over running MySQL while the developers have cared, but less so. The reason is that when something goes wrong PostgreSQL is much easier to diagnose. So while some tasks like upgrades take less manual effort with MySQL PostgreSQL is more stable and gives better error messages for when you get paged in the middle of the night.
  PostgreSQL has prioritized correctness and stability which while it has made certain features more clunky to use or taken longer time for them to be implemented, when you get paged n the middle of the night you get the time you spent back. PostgreSQL also has spent a lot of time on improving DBA experience so mixed DBA/sysadmins also usually prefer PG.
  
  samlambert 2 days ago
  
  this simply can’t be true. there are endless stories of people moving from postgres to mysql because of reliability and operational issues. postgres has only had a mature replication solution for 7 years. mysql had it in 1999.
  
  sgarland 3 days ago
  
  > The split on “which relational database to use” in my career has almost always been perfectly split between SWE vehemently demanding pgsql for the feature set
  I’ve seen this as well, but when pressed, none of them could articulate what part of its feature set they actually needed to use.
  > One of the few things I’ve enjoyed with the move into devops and companies forcing previously “pure” developers into operational roles was their discovery that Postgres was utterly horrible to administer at a systems level.
  Are you (or your devs, rather) actually running your own DBs? If so, respect. My experience has been that they spin up either a hideously under or over-provisioned RDS or Aurora instance, and then never touch it until it breaks, at which point they might ask for help, or they might just make it bigger.
  
  ttfkam 3 days ago
  
  > none of them could articulate what part of its feature set they actually needed to use.
  Transactional DDL: migration errors never leave the database in an intermediate/inconsistent state.
  Range types + exclusion constraint: just no way to do this in MySQL without introducing a race condition.
  Writeable CTEs: creating insert/update/delete pipelines over multiple tables deterministically. Seriously though, the RETURNING clause is something I use all the time both in and out of CTEs.
  Filtered aggregates and grouping sets: cleanly get multiple data points for a dashboard in one shot.
  Unnest: converting arrays into a set of rows. Inverse of array_agg(...).
  Types: arrays, booleans, IP/subnets, UUIDs (without binary(16) hacks), etc.
  Materialized views: seriously, how does MySQL not have this yet?
  Statement-level triggers: another option from per-row.
  Row-level security: setting data visibility based on configurable policies.
  I can cite specific use cases I've deployed to production for each of these and more.
  
  sgarland 3 days ago
  
  That is a well-thought out list, and you’re clearly aware of and take advantage of the DB’s capabilities. Seriously, congrats. Especially RETURNING – it’s always baffling to me why more people don’t use it (or its sad cousin in MySQL that lets you get the last inserted rowid if using an auto-increment).
  Most devs I’ve worked with don’t know about aggregations beyond COUNT and GROUP BY, and do everything in the app. I’ve pointed these out before, and am always told, “we don’t want to have additional logic in the DB.” So you want a dumb bit box then, got it – why are you using an RDBMS?
  > Transactional DDL
  I know this is a thing, and I’ve used it in Postgres, but I’ve also never found myself in MySQL being upset that I didn’t have it. Everything should be thoroughly tested in staging before prod.
  > RLS
  Yes, amazing feature if you use it correctly.
  I will give MySQL a couple of nods in its favor: well, three.
  1. Clustering index. If you design your schema around this fact, range queries can be WAY faster. Dropbox does this, IIRC.
  2. Generated Virtual Columns. Not sure why Postgres still doesn’t support this.
  3. ON UPDATE CURRENT_TIMESTAMP. It’s so nice to have the DB automatically track update times for you, IMO.
  
  ttfkam 3 days ago
  
  I use transactional DDL all the time, even during development. It's nice to not have to fully reset your schema every time you test a migration file locally. With transactional DDL, you run the whole list, and if any fails, it rolls back to where you started. You look at the error, edit your migration, and try again. It really is a time saver. There is a peace of mind always knowing your schema is in a consistent state not unlike the peace that comes from a good set of unit tests.
  1. Yep, I definitely miss clustering indexes in Postgres sometimes. I can sometimes fake it with covering indexes when all I want are an extra column or two along with the primary key or similar without seeking to the main table, but you're right about that MySQL/MariaDB win here.
  2. The dynamic computed column is an easy workaround with immutable functions that take the record as a param.
  CREATE TABLE foo ( a int, b int, c int ); CREATE FUNCTION d(entry foo) RETURNS int LANGUAGE sql IMMUTABLE AS $$ SELECT foo.a + foo.b + foo.c; $$; SELECT a, b, c, d(foo) FROM foo;
  It's not part of the table schema when doing a SELECT *, but it is just as efficient as a computed column in MySQL/MariaDB and only slightly more verbose.
  3. ON UPDATE CURRENT_TIMESTAMP works in Postgres with a trigger function, which you can reuse if all your tables use the same name for your "last_modified" column (probably a good idea anyway). Not as convenient as the declarative syntax, but it's a fairly trivial workaround.
  CREATE OR REPLACE FUNCTION update_last_modified() RETURNS TRIGGER AS $$ BEGIN NEW.last_modified = now(); RETURN NEW; END; $$ language 'plpgsql'; CREATE TRIGGER foo_last_modified BEFORE UPDATE ON foo FOR EACH ROW EXECUTE PROCEDURE update_last_modified(); CREATE TRIGGER bar_last_modified BEFORE UPDATE ON bar FOR EACH ROW EXECUTE PROCEDURE update_last_modified();
  One function, many triggers. You also get to choose between "when transaction started" (now() or CURRENT_TIMESTAMP), "when statement started" (statement_timestamp()), or "right now" (clock_timestamp()).
  I don't mind workarounds so much as functionality that simply cannot be replicated. For example I miss real temporal table support in Postgres like what you can find in MariaDB or MS SQL Server. The painful kludges for missing PIVOT support like in MS SQL Server is another one.
  You never know how much you need deferred foreign key constraints until you don't have them anymore. Or a materialized view.
  
  evanelias 3 days ago
  
  > It's nice to not have to fully reset your schema every time you test a migration file locally
  In terms of dev flow, this is only a problem with imperative migration systems. Declarative schema management tools solve it by being able to transition any live database state into the desired state, which is expressed by a repo of CREATE statements.
  If something fails, you fix the bad CREATE and run the tool again, and it effectively picks up where it left off. And well-designed declarative tools catch many problems pre-flight anyway by running a suite of linters, running the emitted SQL statements in a sandbox first, etc.
  If the tool's diff returns clean, you know your schema is in the right state on the DB.
  Ironically, lack of transactional DDL actually makes declarative schema management more straightforward in MySQL/MariaDB: you can't mix DDL and DML there anyway, so it's more natural to handle schema changes vs data migrations using different tools/pipelines.
  
  evanelias 3 days ago
  
  That's a good list [1]. A handful of these are already doable in modern MySQL and/or MariaDB though.
  JSON can often be used in place of arrays, and JSON_TABLE in both MySQL and MariaDB converts JSON into tabular data. MySQL supports multi-valued indexes over JSON, where each row can have multiple index entries (or no entries, e.g. partial index).
  MariaDB has built-in convenience types for ipv4, ipv6, and uuid. Or in MySQL you can just use virtual columns to add human-readable conversions of binary columns, although that is admittedly slightly annoying.
  MariaDB supports RETURNING.
  [1] Edit to add: I do mean that honestly, it's an accurate and insightful list of nice Postgres features, most of which aren't in MySQL or MariaDB. Honestly baffled as to why I'm being downvoted.
  
  ttfkam 3 days ago
  
  > JSON can often be used in place of arrays
  This is like storing UUIDs as text. You lose type information and validation. It's like storing your array as a comma-delimited string. It can work in a pinch, but it takes up more storage space and is far more error prone.
  > convenience types for ipv4, ipv6, and uuid.
  That's nice to see. A shame you have to decide ahead of time whether you're storing v6 or v4, and I don't see support for network ranges, but a definite improvement.
  > MariaDB supports RETURNING.
  That's honestly wonderful to see. Can these be used inside of CTEs as well for correlated INSERTs?
  
  evanelias 3 days ago
  
  Regarding using JSON for arrays, MySQL and MariaDB both support validation using JSON Schema. For example, you can enforce that a JSON column only stores an array of numbers by calling JSON_SCHEMA_VALID in a CHECK constraint.
  Granted, using validated JSON is more hoops than having an array type directly. But in a pinch it's totally doable.
  MySQL also stores JSON values using a binary representation, it's not a comma-separated string.
  Alternatively, in some cases it may also be fine to pack an array of multi-byte ints into a VARBINARY. Or for an array of floats, MySQL 9 now has a VECTOR type.
  Regarding ipv6 addresses: MariaDB's inet6 type can also store ipv4 values as well, although it can be inefficient in terms of storage. (inet6 values take up a fixed 16 bytes, regardless of whether the value is an ipv4 or ipv6 address.)
  As for using RETURNING inside a writable CTE in MariaDB: not sure, I'd assume probably not. I must admit I'm not familiar with the multi-table pipeline write pattern that you're describing.
  
  ttfkam 3 days ago
  
  > the multi-table pipeline write pattern
  WITH new_order AS ( INSERT INTO order (po_number, bill_to, ship_to) VALUES ('ABCD1234', 42, 64) RETURNING order_id ) INSERT INTO order_item (order_id, product_id, quantity) SELECT new_order.order_id, vals.product_id, vals.quantity FROM (VALUES (10, 1), (11, 5), (12, 3)) AS vals(product_id, quantity) CROSS JOIN new_order ;
  Not super pretty, but it illustrates the point. A single statement that creates an order, gets its autogenerated id (bigint, uuid, whatever), and applies that id to the order items that follow. No network round trip necessary to get the order id before you add the items, which translates into a shorter duration for the transaction to remain open.
  
  evanelias 3 days ago
  
  Thanks, that makes sense.
  In this specific situation, the most common MySQL/MariaDB pattern would be to use LAST_INSERT_ID() in the second INSERT, assuming the order IDs are auto-increments. Or with UUIDs, simply generating the ID prior to the first INSERT, either on the application side or in a database-side session variable.
  To avoid extra network calls, this could be wrapped in a stored proc, although a fair complaint is that MySQL doesn't support a ton of different programming langauges for procs/funcs like Postgres.
  
  ComputerGuru 2 days ago
  
  You still can’t use uuid as proper foreign keys with validation on mariaDB/MySQL though, right? It wasn’t possible with blobs at any rate.
  
  evanelias 2 days ago
  
  This has always been possible, for example using the BINARY(16) column type if you want to be efficient. Or in MariaDB 10.7+ you can now use the dedicated UUID column type, which is equivalent to BINARY(16) under the hood, but provides a human-readable hex value when queried.
  UUIDs are fixed-length. Blobs are not the appropriate type for that.
  
  dalyons 3 days ago
  
  > My experience has been that they spin up either a hideously under or over-provisioned RDS or Aurora instance, and then never touch it until it breaks, at which point they might ask for help, or they might just make it bigger.
  Yep that’s exactly what I’ve seen too :). I still overall prefer this distributed database model - yes you spend more and people make mistakes (and learn). But if you can afford it you get higher velocity and more incentive aligned ownership than the old central gate keeping DBA team model.
  
  ttfkam 3 days ago
  
  > the old central gate keeping DBA team model
  I have mixed feelings about this. On the one hand I agree that ownership should be shared. On the other, app developers really don't consider their data structures as carefully in SQL as they do in-memory. It's odd. The right data structure matters more than a good algorithm since algorithms are easier to change. Once you settle on a list vs a set vs a queue, you're stuck once code is built around it.
  The same is doubly true for the database schema. Lack of planning and knowledge of expected access patterns can turn an otherwise fast database to mud in no time flat. Once your data is in there, changing the schema is exponentially harder.
  "I’m a huge proponent of designing your code around the data, rather than the other way around, and I think it’s one of the reasons git has been fairly successful… I will, in fact, claim that the difference between a bad programmer and a good one is whether he considers his code or his data structures more important. Bad programmers worry about the code. Good programmers worry about data structures and their relationships." – Linus Torvalds (2006)
  What is your database but a bunch of data structures and relationships? I get why the gatekeeping occurred. I don't agree with it, but I understand it. Far too many folks consider expertise in data stores to be optional as developers.
  
  sgarland 2 days ago
  
  100% on all points. I’m a fan of gatekeeping things that are hard to get right, and hard to undo. If you can prove that you know what you’re doing, by all means, own your stuff. But until then, yes, I’d like to watch over your shoulder, and hopefully teach you how to do it right.
  The fact that DBs are data structures but are ignored has always irritated me. If I pushed a PR using lists for everything regardless of their practicality, I’d be rightly told to try again. But push a PR with a schema using similar suboptimal choices, and no one blinks an eye.
  
  dalyons 2 days ago
  
  I agree but also I think I could be clearer about the key advantage of the distributed many dbs model… it’s that I don’t have to care if people are good at anything that you said :). If teams want to do dumb things with their schema, fine, that’s on them and they’ll have to deal with the consequences. If it matters, they’ll learn and get better. If they want to buy their way out of it with their latency and infra budget instead, that’s honestly fine too.
  With many smaller databases owned separately, the blast radius of bad db decisions is small/local.
  
  sgarland 3 days ago
  
  I don’t mind the model IFF the team has interest in learning how to do it correctly. My biggest complaint as both an SRE and now DBRE has been that dev-managed infrastructure inevitably means during an incident that I had nothing to do with, I’ll be paged to fix it anyway. Actually, that’s not the problem; the problem is later when I explain precisely how and why it broke, and how to avoid it in the future, there’s rarely any interest in doing so.
  “You have an unpartitioned table with a few billion rows and a UUIDv4 PK. I’m amazed it’s been working as long as it has. You need to change your schema.”
  “Or we can just buy a bigger instance.”
  “…”
  Rinse and repeat six months later. I’m aware this is an organizational problem, but from what I’ve seen, it’s endemic.
  Re: incentives, yes, also a problem. Dev teams are generally incentivized by Product, who doesn’t care at all whether or not something is optimal, only that new features are being shipped. I despise this mentality, but it’s not usually the devs fault.
  
  necovek 3 days ago
  
  UUID (version does not matter for storage, only for generation and distribution) is basically a 128-bit unsigned int, so a double "word" on 64-bit platforms, and it's natively supported by Postgres since at least 8.3 (earliest version with docs up).
  While most versions ensure it's random, there are plenty of indexing algorithms that make searching through that quick and close to O(1), so that should not be the schema problem.
  Unless you used a string field, but there is a quick workaround for that with Postgres too (make an index on `UUID(field)`, and look it up by `UUID(value) = UUID(field)`).
  That's why both "devops" (DBAs?) and "devs" prefer Postgres over many other databases: you can easily handle some small mistakes in schemas too while you do the full migration in parallel.
  
  sgarland 3 days ago
  
  Who said this was Postgres? MySQL (with the default InnoDB engine) and MSSQL both are clustering indexes; they store tuples around the PK. For a UUIDv4 PK – or anything else non-k-sortable, for that matter – this results in a massive amount of B+tree bloat from the random inserts.
  But sure, let’s talk about Postgres. After all, it stores tuples in a heap, and so is immune to this behavior.
  Except that its MVCC implementation means that it has to maintain a Visibility Map [0] to keep track of which pages contain only tuples which are visible to every active transaction. This is nominally used for vacuum decisions, but is also cleverly exploited for use with index-only scans. This poses a problem when referencing non-k-sortable entries, because while the tuples are in a heap, the indices are generally a B+tree. So now we’re back to the problem of massive IO amplification. Even if the VM and index are entirely cached, reads are reads, and they add up.
  Then there’s the issue of WAL bloat due to full page writes. tl;dr Postgres pages are nominally 8 KiB, and storage device pages are nominally 4 KiB. To guarantee an atomic write, Postgres writes the entire page for a given tuple for the first write after checkpoint, regardless of how many bytes were altered. Again, non-k-sortable: if your tuples are spread randomly across pages, you now have far more pages to write. This can and does matter for either write-heavy workloads, instances with limited network bandwidth (oh hi small RDS / Aurora), or the worst, both combined.
  Re: search complexity, I’m pretty sure B+trees (and B-trees) have O(log n) time complexity, not O(1). Whether or not that’s “close” depends on n, I suppose, but in this scenario I specifically said “billions of rows.”
  > That's why both "devops" (DBAs?) and "devs" prefer Postgres over many other databases
  I’m a DBRE, and like both MySQL and Postgres. They both have strengths and weaknesses, but you need to deeply understand those – and properly design your schema and query patterns around them – in order to make an informed decision.
  [0]: https://www.postgresql.org/docs/current/storage-vm.html
  
  ttfkam 3 days ago
  
  > For a UUIDv4 PK
  To be fair, this blows out any db that supports clustered indexes as well. Non-k-sortable primary keys are just a bad idea all around.
  With UUIDv7, the WAL write amplification problem goes away just as the clustered index issues do.
  
  sgarland 2 days ago
  
  Agreed. My preferred PK in descending order is: natural key if it makes sense and would speed up queries, integer of an appropriate size, UUIDv7. I only rank it below integers because at best, they’re 16 bytes, and even a BIGINT is only 8 bytes.
  
  mschuster91 3 days ago
  
  > Rinse and repeat six months later. I’m aware this is an organizational problem, but from what I’ve seen, it’s endemic.
  Easy enough: almost no one writes SQL queries by hand these days, not for querying the database nor for doing schema upgrades. It's all done by tools - Doctrine in the PHP world for example. And pretty much no one but actual CS graduates knows anything deeper about databases.
  Result is, devs are happy enough if they found something that works, and don't want to risk being the one who broke prod because they applied some schema change suggested by their DBA who doesn't know some random thing about the application.
  
  Tostino 3 days ago
  
  You would be very surprised to see the workflow of DB heavy development teams. Some oracle devs have entire backends coded in pl/sql.
  My last company had an absolute ton of pl/pgsql written to support hundreds of ELT pipelines, migrations were all hand written and managed with liquibase.
  There are more of them than you'd think out there. Just generally supporting some boring b2b or backend software.
  
  ttfkam 3 days ago
  
  > Easy enough: almost no one writes SQL queries by hand these days, not for querying the database nor for doing schema upgrades. It's all done by tools
  Your experience does not match mine. Tools like ORMs make horrible schemas in my opinion that cater to the lowest common denominator of SQL engine functionality. This means leaving a lot of performance and scalability on the floor. In order to make the ORMs generate decent schema definitions, you need to know the underlying engine and therefore SQL. At that point, you might as well use SQL.
  Ever try changing a column's data type from a table with hundreds of millions of rows with an ORM definition file? Hope you like downtime.
  
  ttfkam 3 days ago
  
  > My experience has been that they spin up either a hideously under or over-provisioned RDS or Aurora instance, and then never touch it until it breaks
  That's a true shame considering how easy it is to make a read replica of any size and then fail over to it as the new primary. Definite skill issues.
  
  sgarland 3 days ago
  
  It’s more like a “not knowing how fast something should be” in the case of under-provisioning, and “not knowing or caring to look at metrics” for over-provisioning.
  I once was examining some queries being generated via Prisma, and found it was using LIMIT/OFFSET for pagination. I pointed this out to the devs, who replied that the query times were acceptable for their SLOs. I guess if you don’t know that a simple SELECT can and should often be in the sub-msec range, you might not be concerned that it’s taking 100 msec.
  The other is just the normalization of cloud expenditure. Every large org has some kind of pricing agreement with their provider, and so are rarely incentivized to cut costs, since they have a minimum spend.
  
  ttfkam 3 days ago
  
  I hear you.
  When all you know is an ORM, you tend to treat SQL databases like dumb bit bucket add-ons to your app server. It's amazing how much potential performance and scalability are left on the floor because app developers can't shift their mindset when needed. Objects/structs cannot be assumed to map 1:1 with relations. What a world we'd live in if devs spent even 1/10 the effort examining their relational schema design that they spend arguing over whether a set, a list, or a queue is better for a given situation. It's like thoughts on Big-O stop cold at the database driver interface.
  
  danudey 3 days ago
  
  MySQL feels more "modern" (in the context of the early 2000s when I started using it), in that it seemed to know it existed in an environment with other tools; for example, authentication was handled internally and was unrelated to the system user (other than the clients usually using your system username as the default database username if you didn't specify one).
  Compare that with Postgres, which seemed very "old school", going so far as to assume it was the only thing a given server was doing. Connecting to postgres authenticated as your own user; creating a user was done with the `createuser` command (or similar, I don't remember what it was actually called), and not some namespaced `pg_createuser` command that would make it clear what it did.
  I also remember setting up MySQL replication with almost no effort whatsoever, and then in the same year trying to set up Postgres replication - which it didn't have. I was told by other postgres admins to "just set up a script to rsync the database over and over to the other server; then if your first server dies just start the second server up and it'll recover". This seemed like a wildly cavalier attitude towards uptime and reliability, not to mention generating a ridiculous amount of I/O and network traffic for minimal benefit.
  
  mxey 3 days ago
  
  As someone who operates both I much prefer to run pg_upgrade every few years rather than dealing with mysqldump, replication issues and babysitting Orchestrator.
  
  literalAardvark 3 days ago
  
  just as long as you don't forget to reindex after
  
  mxey 3 days ago
  
  > All failure, rebuild, and reindex cases will be reported by pg_upgrade if they affect your installation; post-upgrade scripts to rebuild tables and indexes will be generated automatically.
  
  literalAardvark 3 days ago
  
  Not true. Those scripts only cover problems caused by PostgreSQL itself, not, for example, a change in collations that will silently break your indexes, such as the one in Debian 12.
  
  djbusby 3 days ago
  
  It's a step of your script or runbook. Right?
  
  literalAardvark 3 days ago
  
  It is, but it also takes a very long time during which production data integrity can be severely affected.
  
  eru 3 days ago
  
  Of course, sqlite is even easier on the sys-admins, (but not necessarily the right tool for the job.)
  
  gramakri2 3 days ago
  
  How does one backup a sqlite without stopping the app or disrupting service? I couldn't find a simple answer to this
  
  jlokier 3 days ago
  
  There is a generic way to do it that works with SQLite and other databases, including Postgres. On Linux, take a filesystem or block device coherent snapshot, take your backup of all the SQLite files from the snapshot, then delete the snapshot.
  The app or service continues to run without disruption, and the backup is a coherent database image. Perhaps you have other state in addition to the database, or multiple databses. For example cache files. This covers them too, if they are all in the same snapshot.
  There are many ways to take a snapshot: ZFS and btrfs offer a filesystem command, and any Linux filesystem, such as ext4, can be snapshotted with LVM or LVM-thin. Well known cloud providers like AWS, GCP, Azure also provide ways to snapshot block devices, through their APIs. However, to ensure a coherent image, it may be necessary to use the Linux `fsfreeze` command around API calls.
  The database backup files can have incomplete transactions, but if the files are restored it will be as if the OS was abruptly stopped at the moment of the snapshot. SQLite and other good databases are designed to recover well from this sort of abrupt stop, without corrupting the database. They clean up incomplete transactions on recovery.
  
  zaphar 3 days ago
  
  The typical solution is https://litestream.io/. Comparatively easy to setup for most usages of sqlite.
  
  tommy_axle 3 days ago
  
  With sqlite3 -backup There's an online backup api
- jeltz 3 days ago
  
  To avoid having to slow down development of new PostgreSQL features. Improving upgrades in a way where PostgreSQL does not need to either maintain multiple different versions of parts of the code and/or lock down internal interfaces which now can change freely every major version so they cannot be refactored and improved in the future is not a trivial task, maybe even impossible. Even just the existence of pg_upgrade has to some degree limited what can be done to improve PostgreSQL. Obviously pg_upgrade is worth it, but hindering development even further might not be popular.
  The PostgreSQL team simply does not have the resources to do this. At least not without significantly slowing down development of everything else which there is no political will for. Maybe someone will come up with a genius idea which solves this but I am doubtful. Usually there is no free lunch.
  Maybe some core dev will correct me, I am quite familiar with the PostgreSQL project but not one of the core devs. :)
  
  paulryanrogers 3 days ago
  
  What is this holding back? A redo based alternative to MVCC?
  
  jeltz 3 days ago
  
  Nope, that is totally unrated. To support upgrade in place without an old version of PostgreSQL:
  1. The new version of PostgreSQL would need to able to read all old catalog table formats and migrate them.
  2. The new version of PostgreSQL would need to support all old versions of the parse tree to migrate views.
  3. Likely a bunch of more things that I do not know of. I for example doubt it is trivial to just read an old catalog without having a fully up and running cluster which supports almost everything in that old cluster. The catalog has TOAST tables and indexes for example.
  Right now 1 and 2 are implemented in pg_dump plus by having pg_dump call functions in a running old version of PostgreSQL.
  
  ttfkam 3 days ago
  
  It is a PITA, but I've written scripts that pg_dump just the schema, load the schema into the new db with the new version, set up logical replication between the two, wait for them to sync, reset all the sequences, and rebuild indexes before doing the handover.
  It works with basically no downtime but I agree this kind of thing should definitely be easier, even turnkey.
  
  literalAardvark 3 days ago
  
  I feel that really should be included in core, yes.
  I get why it wasn't, but logical replication has been production ready for a while now, so it really should have at least a little sugar.
  
  nijave 2 days ago
  
  >should have at least a little sugar.
  I think that sums up PG pretty well. It seems there's a lot of things that lean into it the Unix "just use another tool" philosophy that ends up making management more difficult.
  
  paulryanrogers 3 days ago
  
  Does pg_upgrade not do all that?
  Or do you mean the new Pg server should transparently do the upgrade automatically? And while online?
  
  jeltz 2 days ago
  
  Yes, and it does it by starting an instance of the old version and runs pg_dump against it. And that was one thing the original poster complained about.
  
  paulryanrogers 19 hours ago
  
  Are pg_upgrade's docs inaccurate?
  It says it works "without the data dump/restore" and...
  > Major PostgreSQL releases regularly add new features that often change the layout of the system tables, but the internal data storage format rarely changes. pg_upgrade uses this fact to perform rapid upgrades by creating new system tables and simply reusing the old user data files
  Regardless, I suppose it is that reliance on the unchanging of the internal data format which is limiting what refactors can do.
- whizzter 3 days ago
  
  I think one really BIG factor is that built-in logical replication wasn't introduced until PostgreSQL 10 in 2017, before that you only had physical replication for master-slave but iirc that didn't work between versions so doing a "hot" upgrade was more or less impossible without third-party tools iirc.
  So even if it's available these days, the amount of people still subjected to upgrades from older version still leaves an impression that it's really bad.
  
  RegnisGnaw 3 days ago
  
  There is still major issues with logical replication, mainly limited feature support.
  
  whizzter 2 days ago
  
  What's lacking? Specific types or schema changes?
  
  RegnisGnaw 11 hours ago
  
  I don't care much about schema changes as I can setup logical replication just for the major version upgrade. The main issue is the lack of support for large objects, which the code bases I'm dealing with uses heavily.
  
  creshal 3 days ago
  
  Yeah, everywhere I worked there's usually two clusters of postgres databases
  - whatever made it past the postgres 10 hump is on the most recent version and keeps getting migrated
  - whatever is still on 9.x is scheduled to remain that way until the product using it will be EOL'd
- WJW 3 days ago
  
  The answer to both is probably "because maintainer time is scarce and nobody is interested enough".
- throwaway19972 3 days ago
  
  There are many answers to this question and few are as satisfying as the question implies.

elric 3 days ago

Lots of dogmatism in this discussion, it seems. A couple of things:

1. Most psql deployments are not exposed to the interwebz, they are typically only accessible to the applications that need them by virtue of network setup (firewalls etc). This limits the attack vector to whatever the application does. Good.

2. Distro vendors (RHEL et al) often stick to major psql release for the lifecycle of the OS version. If the OS lives longer than the psql major version, they take on the responsability of backporting critical security issues.

3. While upgrades aren't hard, they're not easy either.

4. Psql is pretty much feature complete for many workloads, and pretty stable in general. For many people, there is little need to chase the latest major version.

bravetraveler 3 days ago

> 4. Psql is pretty much feature complete for many workloads, and pretty stable in general. For many people, there is little need to chase the latest major version.
To drive this a little further, "latest and greatest" doesn't always apply. I've chosen software - even databases - for greenfield deployments one or two releases behind for their known characteristics.
Stability doesn't imply perfection, but rather, predictability.
mhd 3 days ago

> 3. While upgrades aren't hard, they're not easy either.
And in my experience, apart from ease of use there's also a major trust issue here. If you're upgrading your app server framework/language, it's easy enough to do a rollback. With databases, people are worried that they might not notice errors right away and then you have to merge the data accumulated since the upgrade with the last backup in case of a rollback.
Not saying that this is entirely rational...
Also, new features on the SQL level are hard to sell if all you're doing is lowest common denominator ORM ("New window functions and faster lateral joins? But we're doing all that in our code!").
KaiserPro 3 days ago

Upgrading a datastore is a massive, massive risk.
It requires a lot of work, planned downtime, or some way to smear updates across the estate.
The cost of any failure is very high. The benefit of any major upgrade is also vanishingly small. Unless you need a specific feature, its just not worth it.
- ttfkam 3 days ago
  
  Or when the version is EOL, not getting security patches, and/or only compatible with an OS that isn't getting security updates.
  …or you're on AWS RDS, which will automatically bump your db cluster if it goes EOL and you ignore the notices for more than a year.
atoav 3 days ago

Also:
5. If your IT department is spread thin already and that old version is running fine, the incentive to potentially create more work for yourself is not gigantic.
- Dalewyn 3 days ago
  
  One of the first laws of the universe that a good engineer learns is: Do not fix what is not broken.
  And no, being old is not broken.
  
  WJW 3 days ago
  
  The entire field of maintenance engineering would like a word. Over longer periods of time it's vastly cheaper to regularly fix things even before they break, and software is no exception.
  Amongst other reasons:
  - Performing regular upgrades is an easy and cheap way to maintain a healthy knowledge base about the system. It's always easier to fix a system that is well understood than a black box which nobody has touched in 20 years. Upgrading regularly also leads to people being more comfortable with the upgrade process and it is likely to become a "regular thing" with checklists and other process improvements that make maintenance safer and more reliable.
  - Doing preemptive maintenance means YOU get to choose when the system is unavailable and then you can plan accordingly, perhaps by planning it during a period with low usage or even arranging for a redundant system to take up the load. The alternative is leaving it up to chance when a system will fail, and that's frequently at high-load moments when you can least afford it.
  - As a corollary to the previous point: a team which is in control of its systems can have more predictable output of feature work, since there will be fewer "unexpected" events.
  Not maintaining your systems is only cheaper in the long run if you don't count the engineering time required to fix things when they break or need to be replaced. Ounce of prevention vs pound of cure and all that.
  
  elric 3 days ago
  
  > The entire field of maintenance engineering would like a word. Over longer periods of time it's vastly cheaper to regularly fix things even before they break, and software is no exception.
  and
  > Not maintaining your systems is only cheaper in the long run if you don't count the engineering time required to fix things when they break or need to be replaced. Ounce of prevention vs pound of cure and all that.
  Those are pithy assertions, but I don't think they're universally factually correct. This applies at some scales, but not at others. One size does not fit all, nor does one maintenance strategy fit all. I do believe I specifically called out the dogmatism in this discussion, and yet here we go piling on more of the same. I don't think that's terribly helpful. I assume that everyone is well aware of all the points you raised, as I suspect that most people running PostgreSQL are not in fact incompetent.
  Not everyone has a team which "is in control of its systems", nor can everyone afford such a team. It may well be cheaper for some people to run something old until it burns only to have The Database Consultant come in one day to raise it from the dead.
  
  LaGrange 3 days ago
  
  > The entire field of maintenance engineering would like a word. Over longer periods of time it's vastly cheaper to regularly fix things even before they break, and software is no exception.
  I mean I think it's because maintenance is so unglamorous. So when it happens and everything doesn't collapse, nobody remembers and starts asking why we even do it (and then sysadmins and middle management suddenly aren't a thing, and companies and IT become exactly as fragile as anyone with _actual_ understanding of management would expect).
  Meanwhile when regular maintenance fails in progress, it often ends up in the news with a body count attached.
  One of my favourite podcasts has a running joke that you should never do maintenance (because so many industrial disasters happen during it). Of course the bias is the point of the joke - and usually the reason things went bad is because either the maintenance was neglected until that point, or the engineering picked Postgresql, I mean, didn't consider the necessity of maintenance.
  
  caeril 3 days ago
  
  Not to mention:
  - If you're scared to perform upgrades, this is a good indicator that you lack confidence in your backup/recovery/reversion policies and procedures, probably for good reason, so maybe fix that.
  
  diffeomorphism 3 days ago
  
  Maintenance engineering is in full agreement with that. You are maintaining the old version, backporting fixes etc. instead of tearing down the whole thing and replacing it with new shiny all the time.
  
  LaGrange 3 days ago
  
  One of the first laws of universe that an experienced engineer learns is that "do not fix what is not broken" never actually applies, and is only brought up by people invulnerable to consequences.
  That doesn't mean "upgrade recklessly," but it does mean you should know _why_ you're either upgrading or _NOT_ upgrading. That's your job, much more than the act of upgrading itself.
  Unpublished vulnerabilities in old software are not a hypothetical. And very old packages are usually broken, just coped with at the expense of significant lost opportunity cost - or because the failure is a combination of rare and impactful that means once it happens everyone is out of job anyway.
  Seriously, I've yet have to encounter a sysadmin using that old, silly adage at me and not later have to admit I was right.
  Edit: so no, you don't stay on an ancient version of the database because "it's not broken." You're staying on it because _the upgrade process itself_ is so broken you're terrified of it.
  
  kayodelycaon 3 days ago
  
  I generally follow if it’s not broken, fixes need to be carefully planned. I can’t tell you how many times I thought I’d quickly do an upgrade and things would go wrong, like all of my home automation stop working right before bed.
  
  LaGrange 3 days ago
  
  I mean, _yeah_, for anything important you should move as carefully as possible. Just, "not upgrading" ain't that.
  
  vbezhenar 3 days ago
  
  It leads to a lot of old software which is not going to be upgraded ever. Then the entire project dies and gets rewritten from the scratch, because nobody wants to work with Windows 2003 server running Delphi 7, Java 1.4 and Oracle 9i in 2020 (personal experience).
  Old software is not necessarily broken, but it is always a tech debt. And you can't live in debt forever, our IT does not work this way.
  
  sunnybeetroot 3 days ago
  
  I do agree, however I think it’s often easier to upgrade iteratively and deal with smaller issues that arise as opposed to upgrading a huge version diff and struggling to understand and fix all the failing parts.
  
  elric 3 days ago
  
  I think there's a balance to be struck there. On large databases, upgrade time can be very slow. Is it worth upgrading iteratively at great expense if there are no benefits to be gained (e.g. no bugfixes that affect you, no critical security issues)? Maybe, maybe not.
  
  chasil 3 days ago
  
  This is alternately known as Chesterton's Fence.
  https://en.wikipedia.org/wiki/G._K._Chesterton#Chesterton's_...
  
  comprev 3 days ago
  
  Preventative maintenance is a thing in both software and hardware world.
  It's often lower risk to apply frequent smaller changes than wait years for one enormous "bundled" upgrade.
  
  yxhuvud 3 days ago
  
  Being old is not being broken, but being old can make it so that something that is broken cannot be fixed.
  
  atoav 3 days ago
  
  This is solid advice, however I would caveat that you can't know if it is broken if you are not checking. E.g. if your old as hell database has a known vulernability that can be expoited, unless you can rule out that it can be used in your setup it is broken by all definitions of the word.
  
  Dalewyn 3 days ago
  
  It's not a caveat since checking is a prerequisite to the law that shall not be violated.
  If you are "fixing" a Schroedinger's Cat, my dude the Supreme Court of the Universe will not be kind.
newaccount74 3 days ago

> 3. While upgrades aren't hard, they're not easy either
I guess it depends on scale? I was surprised how easy it was on Ubuntu. There was an automatic migration script, and it worked. Took less than 5 minutes to upgrade.
Sure, there was downtime, but I think most applications out there can live with scheduled downtime of a few minutes.
If you can't have downtime, then nothing is easy.
- ForHackernews 3 days ago
  
  Lots of companies pretend like they can't afford 5 minutes of scheduled downtime. Then they end up with 12 hours of unscheduled downtime when shtf.
  
  bravetraveler 3 days ago
  
  For SRE (site reliability engineering) the term of art is 'error budget'
  To others reading, be mindful: database upgrade times depend greatly on the data stored within them/where/who is using them. Your development environment doesn't represent production. If the distinction even exists!
  A five minute upgrade can become indefinite with a single lock [row/table, depending on the storage engine/etc]
JeremyNT 3 days ago

I think this is a really good take. It all boils down to "if it ain't broke don't fix it."
No matter how easy it is, it takes more effort and thought to do an upgrade than it does to not do an upgrade at all, and for most users the upside is basically invisible if it exists at all.
xvinci 3 days ago

"What the application does" may not be what you think of, as it is dependent on how secure the application or the layers beneath it are. This is how people get everything pwned step by step. The database server may then reveal credentials to other apps etc.
- elric 3 days ago
  
  Sure. Defense in depth is important. But I hope that your application is only able to talk TCP/5432 to psql. No amount of psql upgrading will protect you against SQL injections in shitty application code.
- Dylan16807 3 days ago
  
  If the database server has significant "other apps", which it probably doesn't.
  
  xvinci 3 days ago
  
  Sure, but then chances are it's hosted on a nas with other data which you dont want ransomware'd, has access to other parts of the network, etc. - it's easy to underestimate the potential impact

jgb1984 3 days ago

I've used postgresql in most of my previous jobs, but using mariadb in my current one, and I must say it's a joy to administer.

The replication and high availability that mariadb offers is rock solid, and much more advanced than what postgresql has to offer. It works out of the box, no plugins or third party shenanigans needed, and there is "one obvious way" to do it. Not a dozen options like in the postgresql ecosystem, each with it's own quirks and drawbacks.

Also, upgrades are a dream. I did 4 major long term stable release upgrades so far and everything was fully automatic and smooth sailing all around.

All of that with about 10 seconds downtime only for each upgrade, despite being a chunky 6TB database, thanks to the user friendly replication process.

I respect postgresql a lot, but mariadb allows me to sleep much better at night thanks to replication, smooth upgrades and no VACUUM to worry about.

kstrauser 3 days ago

To be fair to PostgreSQL, it has multiple replication setups because they have different inherent properties. For instance, it comes with built in streaming and logical replication setups. Which should you use? It’s impossible to say without knowing your own exact use case.
By analogy: should a database cluster optimize consistency or availability? Answer: there’s no way its author can guess which is more important to how you want to use it.
- homebrewer 3 days ago
  
  MySQL/MariaDB have binary and logical replication too. They also support much more advanced topologies like multi-master out of the box. It's just that what you need for 99% of situations relies on a well trodden path and requires no configuration at all.
  I have suspected for a long time that most people who criticize MySQL have never actually worked with it, or have done so a couple of decades ago. It's often the default choice if you don't need some of the PostgreSQL features (like PostGIS) and can work with either one.
  
  mxey 3 days ago
  
  MySQL has only logical replication, there is no equivalent to PostgreSQL‘s WAL/physical replication.
  
  throw0101c 3 days ago
  
  > They also support much more advanced topologies like multi-master out of the box.
  This is the one thing in My/MariaDB that I miss in Pg: Galera. So handy for in-house stuff that needs some kind of HA: either a three-server setup, or a two-server+arbitrator.
  
  mxey 3 days ago
  
  I haven’t used Galera, but I am very happy with https://patroni.readthedocs.io/en/latest/ for Postgres.
  
  jskrablin 3 days ago
  
  Try https://pg-auto-failover.readthedocs.io/en/main/intro.html
  
  ttfkam 3 days ago
  
  Since v16, Postgres supports bidirectional logical replication out of the box. In other words, multiple writers.
- jeltz 3 days ago
  
  The same is true for MariaDB so I have no idea what he is talking about. Both databases have multiple options.
mxey 3 days ago

> The replication and high availability that mariadb offers is rock solid, and much more advanced than what postgresql has to offer. It works out of the box, no plugins or third party shenanigans needed, and there is "one obvious way" to do it.
Looking at https://mariadb.com/kb/en/standard-replication/, is the one obvious way binlog replication or Galera?
Gud 3 days ago

This is highly interesting to me because I was on the PostgreSQL bandwagon way before it was cool(mostly because I’m a BSD/MIT license fanatic).
jeremycarter 3 days ago

I've maintained a few large postgres databases and completely agree.

bityard 3 days ago

You might as well ask, why does anyone run an older version or anything? The reasons will be largely the same.

Most of the software on my machines are "old" because they are part of a Linux distribution that (aside from security issues) was frozen in time a year or two ago so that it could be tested, released, and maintained. I am quite happy to have a system that I know is not going to break (either itself, or my workflow) when I apply security updates.

People who MUST HAVE the latest version of everything I feel either have some deeper FOMO issues to work out, suffer from boredom, or look at their computers as hobbies themselves rather than tools. (Which is fine, just be honest about what it is.)

That said, much of my career has been spent working at companies who got so busy shipping features that upgrading infrastructure never makes it above the fold. You can tell the managers that working around old software adds costs that scale with the age of the infrastructure, but they don't always listen. I currently work at a company that still has loads of CentOS 7 hosts still in production, and only fairly recently began upgrading them to RHEL 8. (Not 9!)

efields 3 days ago

These are the companies you want to be at IMHO. Provided the compensation is adequate, slow and stable > fast and pivot-y.
- necheffa 3 days ago
  
  > These are the companies you want to be at IMHO. Provided the compensation is adequate, slow and stable > fast and pivot-y.
  Absolutely...not.
  Slow does not mean stable. Slow means the floor is rotting out from under you constantly.
  Being prudent about when and where to upgrade is a very active, intentional process that the typical company simply don't have the stomach or skill for.
  
  chasil 3 days ago
  
  We are still running OS2200 EXEC-8.
  That platform got SMP in 1964.
  
  throwaway894345 3 days ago
  
  Yeah, eventually you will have to upgrade and deal with all of the accumulated debt. You don’t have to be on the bleeding edge but you should still be updating regularly.
- stackskipton 3 days ago
  
  SRE here, not at all.
  Not chasing shiny is important but generally when tech debt builds up this high, life is generally hell in terms of outages, unable to accomplish basic tasks and dealing with a bunch of people who have NIH syndrome.
  
  Spivak 3 days ago
  
  Which is why you build on a platform like Alma/Redhat that gives you 10 years of support. You can call it outdated I guess but I prefer "supported." Make everyone else work out the new bugs before upgrading-- it used to be the rule not to update to a .0 release but being a hip modern developer means moving the 0 to the other side and yoloing it.
  
  ziml77 3 days ago
  
  The problem is that software ends up not getting touched for all those years, but eventually needs an upgrade when it's at end of support. And at that point you end up having to make changes to a system where no one remembers how it works or how to deploy it. Keeping software up to date to me is similar to how you practice disaster recovery scenarios. You do it regularly so no one forgets the processes and so any issues can be dealt with while you're not under a short time limit.
- willsmith72 3 days ago
  
  older versions can also mean deprecated packages everyone's too scared to touch, failure to invest in maintenance and tech debt reduction, or use of old technologies which stopped receiving security updates
throwaway894345 3 days ago

I don’t necessarily need to be on the latest version, but I prefer to take many small upgrades rather than one big upgrade at least when it comes to databases. Frequent upgrades also forces an organization to get good at upgrading and managing the risk.

paulryanrogers 4 days ago

Upgrades are hard. There was no replication in the before times. The original block-level replication didn't work among different major versions. Slony was a painful workaround based on triggers that amplified writes.

Newer PostgreSQL versions are better. Yet still not quite as robust or easy as MySQL.

At a certain scale even MySQL upgrades can be painful. At least when you cannot spare more than a few minutes of downtime.

darth_avocado 3 days ago

> At least when you cannot spare more than a few minutes of downtime.
I think it boils down to this. We used to constantly be running the obsolete pg version until it became an emergency mostly because upgrading with the tooling available at the time was very painful. Today however, we stay relatively up to date. Once you figure out the data replication, you can almost do blue green deployments on databases with almost no down time.
pella 3 days ago

MySQL vs. PostgreSQL - Jepsen Analyses
https://jepsen.io/analyses/mysql-8.0.34 ( MySQL 8.0.34 2023-12-19 )
HN: https://news.ycombinator.com/item?id=38695750
"MySQL is a popular relational database. We revisit Kleppmann’s 2014 Hermitage and confirm that MySQL’s Repeatable Read still allows G2-item, G-single, and lost update. Using our transaction consistency checker Elle, we show that MySQL Repeatable Read also violates internal consistency. Furthermore, it violates Monotonic Atomic View: transactions can observe some of another transaction’s effects, then later fail to observe other effects of that same transaction. We demonstrate violations of ANSI SQL’s requirements for Repeatable Read. We believe MySQL Repeatable Read is somewhat stronger than Read Committed. As a lagniappe, we show that AWS RDS MySQL clusters routinely violate Serializability."
---------------
https://jepsen.io/analyses/postgresql-12.3 ( PostgreSQL 12.3 2020-06-12 )
HN: https://news.ycombinator.com/item?id=23498781
"PostgreSQL is a widely-known relational database system. We evaluated PostgreSQL using Jepsen’s new transactional isolation checker Elle, and found that transactions executed with serializable isolation on a single PostgreSQL instance were not, in fact, serializable. Under normal operation, transactions could occasionally exhibit G2-item: an anomaly involving a set of transactions which (roughly speaking) mutually fail to observe each other’s writes. In addition, we found frequent instances of G2-item under PostgreSQL “repeatable read”, which is explicitly proscribed by commonly-cited formalizations of repeatable read. As previously reported by Martin Kleppmann, this is due to the fact that PostgreSQL “repeatable read” is actually snapshot isolation. This behavior is allowable due to long-discussed ambiguities in the ANSI SQL standard, but could be surprising for users familiar with the literature. A patch for the bug we found in serializability is scheduled for the next minor release, on August 13th, and the presence of G2-item under repeatable read could be readily addressed through documentation."
slotrans 3 days ago

"Not as robust as MySQL"? Surely you're joking.
- sgarland 3 days ago
  
  They’re not wrong. If you’ve ever spent meaningful time administering both, you’ll know that Postgres takes far more hands-on work to keep it going.
  To be clear, I like both. Postgres has a lot more features, and is far more extensible. But there’s no getting around the fact that its MVCC implementation means that at scale, you have to worry about things that simply do not exist for MySQL: vacuuming, txid wraparound, etc.
  
  lelanthran 3 days ago
  
  My experience of both is that MySQL is easier for developers, PostgreSQL is easier for sysads.
  That was true in 2012; dunno if it still applies though.
  
  sofixa 3 days ago
  
  I doubt it was true in 2012, because sysadmins would be the ones trying to make it run reliably, including things like replication, upgrades, etc.
  Pretty sure that even in 2012 MySQL had very easy to use replication, which Postgres didn't have well into the late 2010s (does it today? It's been a while since I've ran any databases).
  
  lelanthran 3 days ago
  
  > I doubt it was true in 2012, because sysadmins would be the ones trying to make it run reliably, including things like replication, upgrades, etc.
  Possibly I got it wrong and switched around which was easier on the devs and which was easier on the sysads?
  In my defence, ISTR, when talking to sysads about MySQL vs PostgreSQL, they preferred the latter due to having less to worry about once deployed (MySQL would apparently magically lose data sometimes).
  
  karamanolev 3 days ago
  
  MyISAM in the olden days could/would magically lose data. InnoDB has been the de facto standard for a while and I haven't seen data loss attributed to it.
  
  yxhuvud 3 days ago
  
  In 2012 MySQL had several flavors of replications, each with its own very serious pitfalls that could introduce corruption or loss of data. I saw enough of MySQL replication issues in those days that I wouldn't want to use it.
  But sure, it was easy to get a proof of concept working. But when you tried to break it by turning off network and/or machines, then shit broke down in very broken ways that was not recoverable. I'm guessing most that set up MySQL replication didn't actually verify that it worked well when SHTF.
  
  sofixa 3 days ago
  
  Maybe that was true in 2012 (maybe it was related to MyISAM) but by ~2015 with InnoDB MySQL replication was rock solid.
  
  yxhuvud 3 days ago
  
  It was not related to MyISAM.
  How did you verify that it was rock solid? And which of the variants did you use?
  
  evanelias 3 days ago
  
  Many of the largest US tech companies were successfully using MySQL replication in 2012 without frequent major issues.
  source: direct personal experience.
  
  est 3 days ago
  
  > pitfalls that could introduce corruption or loss of data
  sometimes, repairing broken data is easier than, say, upgrading a god damn hot DB.
  MVCC is overrated. Not every row in a busy MySQL table is your transactional wallet balance. But to upgrade a DB you have to deal with every field every row every table, and data keeps changing, which is a real headache
  Fixing a range of broken data, however, can be done by a junior developer. If you rely on rdbms for a single source of truth you are probably fucked anyway.
  btw I do hate DDL changes in MySQL.
  
  GoblinSlayer 3 days ago
  
  >MySQL is easier for developers
  Except that search doesn't work, because all text is in latin1 encoding.
  
  sgarland 3 days ago
  
  While obviously I prefer Unicode, latin-1 is perfectly acceptable for most use cases in Western cultures.
  What part of searching have you found to not work well using it?
  
  GoblinSlayer a day ago
  
  The catch is that PHP backend passes strings in utf8.
  
  homebrewer 3 days ago
  
  This was never true if you know what you're doing, and hasn't been true since 2018 even if you don't.
  
  yxhuvud 3 days ago
  
  Either latin1 or that absolutely horrible max-three-byte utf8 encoding that used to be the only available option for utf8.
  
  williamdclt 3 days ago
  
  Interestingly, someone else in another comment is arguing the exact opposite!
  
  pritambarhate 3 days ago
  
  My experience has been exactly opposite. Ability to do Vacuums is good. MySQL doesn’t free up space taken by deleted rows. The only option to free up the space is to mysqldump the db and load it again. Not practical in most of the situations.
  
  sgarland 3 days ago
  
  VACUUM rarely reclaims space from the OS’ perspective, if that’s what you meant. It can in certain circumstances, but they’re rare. VACUUM FULL is the equivalent to OPTIMIZE TABLE – both lock the table to do a full rewrite, and optimally binpack it to the extent that is posssible.
  EDIT: my mistake, OPTIMIZE TABLE is an online DDL. I’ve been burned in the past from foreign key constraint metadata locks essentially turning it into a blocking operation.
  
  iamjkt 3 days ago
  
  Running 'optimize table <table>' reclaims the space on MySQL/MariaDB.
  
  benoitg 3 days ago
  
  Not really, the innodb_file_per_table variable has been set to 1 for a long time. Running OPTIMIZE TABLE frees up the disk space in this case.
  
  sudhirj 3 days ago
  
  Is this process materially different from a vacuum? Does it manage to optimise without a write lock?
  
  homebrewer 3 days ago
  
  http://rhaas.blogspot.com/2011/02/mysql-vs-postgresql-part-2...
  
  sudhirj 2 days ago
  
  That helps a lot thanks. Will summarize it quickly for those who come later: MySQL (InnoDB really) and Postgres both use MVCC, so they write a new row on update. InnoDB however also additionally writes a record marking the old row for deletion.
  To do a cleanup, InnoDB uses the records it kept to delete old data, while Postgres must do a scan. So InnoDB pays a record-keeping price as part of the update that makes it easier to clear data, while Postgres decides to pay this price of occasional scanning.
  
  benoitg 3 days ago
  
  I don't know how VACUUM works, I couldn't tell you about the differences.
  The OPTIMIZE works almost exclusively with online DDL statements. There's only a brief table lock held during table metadata operations, but I haven't found that to be a problem in practice. (https://dev.mysql.com/doc/refman/8.4/en/optimize-table.html#...)
  
  wbl 3 days ago
  
  Yeah but you don't need to worry about your data existing. MySQL has been known to silently fail the one job of a DB.
  
  maxk42 3 days ago
  
  Not in around 15 years. You're thinking of when MyISAM was the default storage engine for MySQL. It has been InnoDB for over a decade. InnoDB is very reliable - I've never had a single data loss incident in all that time, and I've managed some very large (PB-scale) and active databases.
  Postgres is definitely more difficult to administer.
  
  yxhuvud 3 days ago
  
  MySQL used to have horrible and very unsafe defaults for new installations that persisted well after the introduction of InnoDB. Those went unfixed for a very long time.
  
  Symbiote 3 days ago
  
  People are criticising decade-old releases of PostgreSQL, so it seems fair to compare them to similarly aged MySQL releases.
  
  SavageBeast 3 days ago
  
  I recall this being the case A LOOOONG time ago but I haven't heard of, read about, been warned to look out for or personally seen such a thing in forever. Have you?
  * I'm running a lot of MySQL stuff and such a topic might be of interest to me
  
  elisbce 3 days ago
  
  Yes, it is messy when you want your MySQL databases to be mission critical in production, e.g. handling a large amount of customer data. Historically MySQL's High Availability architecture has a lot of design and implementation issues because it was an afterthought. Dealing with large amount of critical data means you need it to be performant, reliable and available at the same time, which is hard and requires you to deal with caching, sharding, replication, network issues, zone/resource planning, failovers, leader elections and semi-sync bugs, corrupted logs, manually fixing bad queries that killed the database, data migration, version upgrades, etc. There is a reason why big corps like Google/Meta has dedicated teams of experts (like people who actually wrote the HA features) to maintain their mission critical MySQL deployments.
  
  o11c 3 days ago
  
  From what I can tell, MySQL is supposed to be safe since 2018 if you have no data from before 2010.
  The fact that you still can't use DDL in transactions makes life exceedingly painful, but it's technically safe if you write your migration code carefully enough.
  
  p_l 3 days ago
  
  Some places still have columns declared as utf8 instead of utf8mb4, and there's a special place in hell for authors of the MySQL general clusterfuck regarding encodings - it was all nice and great if you didn't care about anything other than latin1 or ASCII - go outside that before utf8 option and it was horror that even experienced operators managed to fuckup (I have a badge from a Google conference in 2017 with nicely visible effect of "we have mixed up one of the three separate encoding settings in MySQL and now you have mojibake in your badge").
  And then there's UTF8 not actually being UTF8, which can result in total lockup of a table if someone inputs a character that does not fit in UCS-2 and now you need to recover the database from backup and preferably convert all instances of utf8 to utf8mb4, because fuck you that's why.
  
  lol768 3 days ago
  
  Yeah, the whole charset/collation design is a complete dumpster fire.
  
  sgarland 3 days ago
  
  In fairness, reasoning about collations is like peering into the abyss. I get why they’re required to have so many levels of detail, and the Unicode Consortium has done a fantastic job, but to say they’re complicated is putting it mildly.
  
  The_Colonel 3 days ago
  
  Lack of transactional DDL is certainly painful, but not unique for MySQL. Oracle doesn't support it either.
  
  tpmoney 2 days ago
  
  Oracle also didn't support Boolean data types for a long time, and had a 20 some odd year public thread arguing that no one needed a Boolean data type (https://asktom.oracle.com/ords/f?p=100:11:0::::P11_QUESTION_...). They finally added it in Oracle 23 which is nice, but I wouldn't consider it to be in good company to be lacking something Oracle also lacks.
  
  The_Colonel 2 days ago
  
  Not having a boolean data type is IMHO just an annoyance, not comparable to the lack of transactional DDL.
  But to the point, people often use this point to claim that MySQL is a toy database, not usable for real world production use. I use Oracle as a counterpoint, which also has a lot of warts but is pretty much an archetype of an enterprise-grade DB engine.
  
  justinclift 3 days ago
  
  Both databases are owned by the same company. Wonder if it's something inherent to the company which has keep that limitation going?
- erik_seaberg 3 days ago
  
  Early MySQL versions made egregious design choices like quietly ignoring missing foreign keys and enum typos, truncating long strings, and randomly choosing rows from groups.
  https://web.archive.org/web/20230922210124/https://grimoire....
  
  sgarland 3 days ago
  
  Yeah, it was bad. What kills me is SQLite has its own absurd set of gotchas [0] yet is seen as amazing and wonderful by devs. PKs can have NULLs? Sure! Strings can have \0 in the middle of them? Why not? FKs aren’t enforced by default? Yeah, who needs referential integrity, anyway?
  My only conclusion is that the majority of devs don’t actually read documentation, and rely purely on the last blog post they read to influence their infrastructure decisions.
  [0]: https://www.sqlite.org/quirks.html
  
  hu3 3 days ago
  
  Understanding and memorizing shortcomings and quirks takes time and effort.
  Most devs just go with whatever the influencer du jour says is good.
  "Nobody ever got fired for choosing insert_currently_hyped_tech_here"
- Propelloni 3 days ago
  
  It is hard to live down a reputation ;)
  MySQL was immortalized as the database in every LAMP stack. And just like PHP it improved considerably since then.
  
  p_l 3 days ago
  
  Also for licensing reasons for a long time there was surviving contingent of MySQL 3.23 in LAMP hosting.
  
  johannes1234321 3 days ago
  
  While that change from LGPL to GPL affected only the client library (server always was GPL(+commercial)) and the MySQL company relatively quickly reacted with a FOSS exception to the GPL and by providing a reimplementation of the client library under PHP license (mysqlnd) to serve that market.
  (I joined MySQL shortly after that mess, before the Sun acquisition)
  
  p_l 3 days ago
  
  Random hosting providers that were major place for having your baby steps on LAMP stack didn't necessarily grok licensing much
  
  johannes1234321 3 days ago
  
  They also didn't like updating software - to likely that update to PHP or MySQL or something broke some bad script by a customer, who'd complain to the host.
cenamus 3 days ago

Is slony some sort of language joke? Slon is czech (probably slavic in general) for elephant.
(which may be actually derived from the turkish aslan, for lion, but somehow the animal got mixed up)
- newaccount74 3 days ago
  
  I think so. The PostgreSQL logo is an elephant, and the community calls him "Slonik", probably derived from the Russian word for elephant. There is also a node.js driver that is called "slonik"
- anticensor 2 days ago
  
  Not really, the older form of Turkish aslan is actually arslan.
  
  cenamus 2 days ago
  
  I just read that some time ago
  https://en.wiktionary.org/wiki/Reconstruction:Proto-Slavic/s...
Symbiote 3 days ago

I think you're describing versions before 9.2, which was released 12 years ago.
api 4 days ago

I've always wondered why Postgres is so insanely popular. I mean it has some nice things like very powerful support for a very comprehensive subset of SQL functionality, but most apps don't need all that.
It really feels like early 1990s vintage Unix software. It's clunky and arcane and it's hard to feel confident doing anything complex with it.
- tpmoney 3 days ago
  
  > I've always wondered why Postgres is so insanely popular.
  In no particular order, my preference for postgres is driven by:
  * Date / time functions that don't suck * UTF-8 is really UTF-8 * 99% of a backup can be done live with nothing more than rsyncing the data directory and the WAL files * Really comprehensive documentation * LTREE and fuzzy string match extensions * Familiarity from using it for years
  MySQL/Maria I'm sure is fine, but it's one of hose things where it's just different enough and I haven't encountered a compelling use case for changing my preference.
  
  fhdsgbbcaA 3 days ago
  
  UTF-8 is what made me switch. It’s insane MySQL has something called UTF-8 that isn't really UTF-8, but do have a type UTF8MB4 that actually is correct. This means if you use UFT-8 in MySQL, you can’t use emoji for example.
  
  bastawhiz 3 days ago
  
  And the fact that adding real utf-8 support limited (limits?) the length of strings that can be indexed
  
  evanelias 3 days ago
  
  Postgres limits btree keys to 2704 bytes, which is actually slightly smaller than MySQL's limit of 3072 bytes, assuming the default InnoDB storage engine.
  That said, when using utf8mb4 in an index key, MySQL uses the "worst case" of each character being 4 bytes. So it effectively limits the max key size to 3072/4 = 768 characters, when a column is using the utf8mb4 character set.
  For practical purposes, this doesn't cause much pain, as it's generally inadvisable to use complete long-ish strings as a key. And there are various workarounds, like using prefixes or hashes as the key, or using binary strings as keys to get the full 3072 bytes (if you don't need collation behaviors).
  
  bastawhiz 3 days ago
  
  > So it effectively limits the max key size to 3072/4 = 768 characters, when a column is using the utf8mb4 character set.
  This is exactly what I mean. 768 characters for an index is woefully bad. And for no obviously great reason: you can just index the encoded UTF-8 text.
  This was literally reason why a former company (who will remain nameless) refused to add Unicode support. It's not even an imagined problem.
  
  sgarland 3 days ago
  
  You should not be indexing 768 characters in any circumstance I can imagine. Go ahead and try it. Spin up two tables, fill them with a few million rows, and slap and index on them. Give one a reasonable prefix limit, and let the other go wild. Make sure you ANALYZE each, then run queries in a loop and check the times.
  Spoiler: I literally did this a couple of days ago. The index size bloat means that any possible savings you might have gained from collisions are obliterated from page fetches. I tested with a measly 128 characters vs. a prefix of 16, and that was enough for the average query time to be equal, with the smaller index winning for the minimum.
  
  evanelias 3 days ago
  
  Why did you need to index fairly long strings in their entirety in a way that preserves collation behaviors?
  And why is a 768 character limit woefully bad, but a 2704 character limit is totally fine?
  
  bastawhiz 3 days ago
  
  A URL, for instance, can't be safely stored in 768 characters, but it can be stored safely in 2704. If you then wanted to sort those URLs so that all URLs for each domain and path within that domain are adjacent, you need an index. Especially if you want to paginate over them with a cursor. Doing that without an index on the raw value is a royal pain in the ass.
  Hell, even just being able to sort user-submitted strings up to a kilobyte. Why up to a kilobyte? Some users have strings that are kind of long. If I have to define a second column that's the truncated prefix, that's just a silly waste of space because MySQL decided to use utf-32 under the hood.
  
  The_Colonel 3 days ago
  
  > it can be stored safely in 2704
  No, it can't. URL doesn't have any length limit, regardless of the fact that different software will impose different limits.
  
  bastawhiz 3 days ago
  
  Browser address bars have a limit of 2048, so if that's your use case, yes it's safe.
  
  The_Colonel 2 days ago
  
  Safari has 80 000, Firefox 65K.
  There are plenty of needs to store URLs which will never go through a browser.
  You can only claim that "some URL use cases" can be stored in 2048 characters.
  
  evanelias 3 days ago
  
  > A URL, for instance
  VARBINARY is typically ok for that I'd think? Then you can utilize the full 3072 byte limit for the key, since there's no character set applied.
  > even just being able to sort user-submitted strings up to a kilobyte
  As a software engineer, I completely agree. But as a DBA, I am obligated to make a "tsk" sound and scowl disapprovingly!
  
  crazygringo 3 days ago
  
  To be honest, indexes aren't designed for that. They're meant for fast lookup of short identifiers. Things like people's names and product ID's. Not long URL's. It's not performant.
  If you need to keep a million long URL's in a defined sort order, my first recommendation would be, don't -- see if there's another way to achieve your end result. But if you absolutely have to, then create a new integer column to be your sort key, and use a little bit of extra code to give it values that produce the same sort order.
  Creating short numerical primary keys for long strings is a common database technique.
  
  bastawhiz 3 days ago
  
  > indexes aren't designed for that. They're meant for fast lookup of short identifiers. Things like people's names and product ID's. Not long URL's. It's not performant.
  This is objectively false. If this was true, indexes wouldn't serve range queries. You couldn't index on dates. You couldn't sort numbers.
  > But if you absolutely have to, then create a new integer column to be your sort key, and use a little bit of extra code to give it values that produce the same sort order.
  This fails when you need to insert new values into the table. Then you not only need to figure out the new integer value (how, if you can't efficiently compare sorted string values???), you need to update all the integers to make room.
  
  crazygringo 3 days ago
  
  Sorry, I was considering short things like dates and numbers as identifiers. I realize that's not quite right -- what I should have said was that indexes are designed for short things period (short identifiers being one of those things). Thanks.
  > This fails when you need to insert new values into the table.
  Yes, that's part of the extra code you need to keep the values accurately sorted. There are a lot of different particular code solutions that might work -- whether allowing for collisions and re-ordering every night with a cron job, or putting large gaps between numbers, or using floats.
  But my main point stands, which is that standard relational databases are not designed to be able to maintain a sorted index of long URL's out of the box. Indexes aren't meant for that and they won't work, and this is by design. You're going to have to roll your own code for that.
  Fortunately I've never come across a case in the wild where maintaining a globally sorted list of long items was required (though I'm not saying they never exist). E.g. if you're building a spider that needs to match against URL's, you'd index a short hash of the URL as a non-unique index. Or if you wanted to display sorted URL's for a site, you'd index by domain name only, and then sort the remainder of the URL at query time.
  
  bastawhiz 3 days ago
  
  > But my main point stands, which is that standard relational databases are not designed to be able to maintain a sorted index of long URL's out of the box.
  You keep saying that, but Postgres does a great job with no issues without any extra work. MySQL is alone in being suboptimal. "It's not designed for that" isn't a good answer, if it works great. Show me how the underlying data structures fail or perform poorly if it's really not something you should do.
  
  evanelias 3 days ago
  
  > MySQL is alone in being suboptimal.
  It's only suboptimal if you choose the wrong column type for the task at hand. For storing URLs, you almost certainly don't want collation behaviors, such as accent insensitivity or case insensitivity. So VARBINARY is a better choice here anyway.
  And as several other commenters have mentioned, at large scale, indexing a bunch of long URLs in b-trees is indeed a bad practice performance-wise in any relational database. You won't be able to fit many entries per page, so read performance will be slow, especially for range scans.
  In that situation it's almost always better to use a non-unique index over a prefix (if you need sorting and range scans) or a hash (if you don't), and disambiguate collisions by having the full value in an unindexed column. And/or split the URL up between the domain name and path in separate columns. If needed, normalize the domain names into a separate table so that the URL table can refer to them by numeric ID. etc. All depends on the specific use-case.
  
  crazygringo 3 days ago
  
  No, Postgres doesn't. 2730 bytes is not long enough to hold all URL's encountered in the wild. But also, your performance will suffer if you use that whole length. You generally don't want to be doing that.
  The difference between MySQL and Postgres here is negligible. It doesn't matter exactly where you define the limit of a short field, except it should probably be able to hold a maximum length filename which is 255 characters, plus some room to spare. Both MySQL and Postgres do this fine.
  
  fweimer 3 days ago
  
  You might just load someone else's data, and the index is desirable in general for speeding up analytic queries. It's possible to work around that, of course. But depending on what you do, it can make writing efficient queries against the data more difficult. That's just a distraction because most of the time, those long columns won't matter anyway.
  
  homebrewer 3 days ago
  
  I won't defend that utf8 brain damage, but the defaults are sane since 2018 — you don't need to set the encoding, it's set to proper utf8 out of the box. MySQL 8 cleaned up a lot of this legacy stuff.
  
  fhdsgbbcaA 3 days ago
  
  Good to hear they saw the light but after I switched to Postgres I never had a single regret.
  In a competitive market where people make very long term engineering decisions based on stability and reliability you can’t fuck up this badly and survive.
  
  sgarland 3 days ago
  
  > This means if you use UFT-8 in MySQL, you can’t use emoji for example.
  I for one have always viewed this as a perk.
  
  fhdsgbbcaA 3 days ago
  
  A database that doesn’t give you back what you put into it is never a perk. It literally can’t handle storing and retrieving the data.
  
  sgarland 3 days ago
  
  I don’t want to see emoji in my database. The customer is only right in matters of taste, not engineering.
  
  fhdsgbbcaA 3 days ago
  
  Ok so if you are doing sentiment analysis of user product reviews you want to silently truncate emoji because you don’t like them? That’s a good idea how?
  
  ttfkam 3 days ago
  
  Uhh… not wanting to see emojis is a matter of taste, not engineering.
  
  sgarland 3 days ago
  
  MySQL does have ON UPDATE for its DATETIME, though; something that Postgres inexplicably still lacks.
  
  fanf2 3 days ago
  
  Isn’t ON UPDATE related to foreign keys and independent of the data type? https://www.postgresql.org/docs/current/ddl-constraints.html...
  
  paulryanrogers 3 days ago
  
  Maybe they're thinking of TIMESTAMP in MySQL, which IIRC would auto update its value on any update to the row. Which was useful for uodated_at like columns. Though I think they later limited it to only the first TIMESTAMP column in a table.
  
  sgarland 3 days ago
  
  No, it works for both [0] types. The first TIMESTAMP thing you’re referring to is that if a specific variable isn’t set, the first TIMESTAMP column automatically gets auto updates applied on creation and update, unless you explicitly defined it to not. This was the default behavior in 5.7, but has since been changed.
  [0]: https://dev.mysql.com/doc/refman/8.0/en/timestamp-initializa...
  
  ttfkam 3 days ago
  
  Transactional DDL!
- fzeindl 3 days ago
  
  * transactional DDL
  * comprehensive transaction model using different modes
  * PostGIS and lots of other great extensions
  * supports most of the current SQL standard and is clear on interpretation of edge-cases in the documentation
  * support for writing stored procedures in any major programming language
  * many useful functions regarding dates, sets, ranges, json, xml, ...
  * custom datatypes
  * extremely thought-out and standardized approach to development: if a feature is included it generally works well in interaction with everything else
  * syntax, semantics and performance are all very predictable
  * great documentation
  Regarding MySQL / MariaDB: MySQL optimized for performance first. Until 2010 the standard-storage-engine MyISAM didn't even support transactions.
  PostgreSQL always focused on correctness and stability and then made sure everything performed.
  
  arkh 3 days ago
  
  > * custom datatypes
  Good in theory. But last time I checked the main libs to connect to pgsql, everything you get back from the database are strings. So you need something in your app to convert those strings to the equivalent data structures.
  
  ttfkam 3 days ago
  
  You're thinking only in terms of application. Types in the db save storage space, allow for better validation than plain strings, can be correlated cleanly with other columns with the same type, etc.
  Yes, more drivers and libraries should support the more expansive data type list, but even just within the database itself there are multiple advantages.
- stickfigure 4 days ago
  
  What's the alternative? MySQL? No transactional DDL, immediate fail.
  
  cosmotic 3 days ago
  
  It's not just DDL that isn't transactional, there's a whole bunch of other things that aren't. And they break the transactionality silently. It's like an obstical course where bumping into something might be fatal.
  
  evanelias 3 days ago
  
  What specific non-DDL things are you referring to here?
  Aside from DDL, the only other major ones are manipulating users/grants, manipulating replication, a small number of other administrative commands, and LOCK TABLES.
  This is all documented very clearly on https://dev.mysql.com/doc/refman/8.4/en/implicit-commit.html. Hardly an "obstical course".
  
  stickfigure 3 days ago
  
  "Aside from missing his head, the patient appears to be in fine shape."
  
  evanelias 3 days ago
  
  That hardly seems equivalent. Why do you need to e.g. reconfigure replication inside of a transaction in the first place?
  The lack of transactional DDL is a totally valid complaint, but the non-DDL stuff is just a total head-scratcher to me. Aside from DDL, implicit commits have literally never impacted me in my 21 years of using MySQL.
  
  stickfigure 2 days ago
  
  Sorry - I was trying to make light of the discussion. DDL is so important that it's silly to talk about the other stuff.
  
  jes5199 4 days ago
  
  I worked for a company that migrated from mysql to postgres, but then got big enough they wanted to hire fulltime database experts and ended up migrating back to mysql because it was easier to find talent
  
  bastawhiz 3 days ago
  
  Dunno if that says much about Postgres, but it says a lot about the company
  
  icedchai 3 days ago
  
  Ugh. I worked with MySQL earlier in my career (until about 10 years ago.) All the companies since have been Postgres. All my personal projects are Postgres. I can't imagine going back.
  
  appendix-rock 3 days ago
  
  [dead]
  
  cvalka 3 days ago
  
  TiDB
- justin_oaks 4 days ago
  
  > It really feels like early 1990s vintage Unix software. It's clunky and arcane and it's hard to feel confident doing anything complex with it.
  How software "feels" is subjective. Can you be more specific?
  
  dalyons 3 days ago
  
  It requires a ton of somewhat arcane maintenance at scale. Vacuum shenanigans, Index fragmentation requiring manual reindexing, Txid wraparounds. I like Postgres but it’s definitely way more work to maintain a large instance than mysql. MySQL just kinda works
  
  arkh 3 days ago
  
  Having to tinker with pg_hba.conf files on the server so manage how users can connect.
  
  paulryanrogers 3 days ago
  
  I'd agree that is annoying yet usually just a one off task, unless you really want different IP allowlists per user.
  
  Tostino 3 days ago
  
  In complex environments it is not just a one off task. I dealt with it by automating my infrastructure with ansible, but without some tooling it sucks.
  
  threeseed 4 days ago
  
  The command line experience is old school style i.e. to show tables.
  \c database \dt
  Versus:
  use database show tables
  
  georgyo 3 days ago
  
  I started with MySQL in 2006 for my personal projects, but what first won me over to psql was those commands.
  Today I use CLIs like usql to interact with MySQL and SQLite so I can continue to use those commands.
  At first glance they may be less obvious, but they are significantly more discoverable. \? Just shows you all of them. In MySQL it always feels like I need to Google it.
  
  stephenr 3 days ago
  
  > At first glance they may be less obvious, but they are significantly more discoverable. \? Just shows you all of them. In MySQL it always feels like I need to Google it.
  In MySQL either `?` or `help` or `\?` will show you the help...
  
  rootusrootus 3 days ago
  
  I assume this is really what it comes down to. If psql added those verbose-but-descriptive commands a whole bunch of people comfortable with mysql would be a lot happier using postgres.
  
  dventimi 3 days ago
  
  That's psql.
  
  fhdsgbbcaA 3 days ago
  
  It’s also faster to type.
  
  eYrKEC2 3 days ago
  
  Not after you have to google, "What's the equivalent of `show tables` in postgres?", because the psql command names are completely arbitrary.
  
  ahoka 3 days ago
  
  They kinda make sense if you consider that Postgres was not an SQL database in the beginning. Quirky though.
  
  Symbiote 3 days ago
  
  They are clearly abbreviations.
  \c is for connect.
  \dt is for describe tables.
  
  mxey 3 days ago
  
  \? shows the help
  
  fhdsgbbcaA 3 days ago
  
  Which you need to do exactly once.
  
  kalleboo 3 days ago
  
  I need to manually admin my database server maybe once every 2 years or so. Definitely not remembering them 2 years later.
  
  dventimi 2 days ago
  
  Sounds like a YP
- Scramblejams 3 days ago
  
  > I've always wondered why Postgres is so insanely popular.
  Just another anecdote: MySQL lost data for me (2004). I spent some time evaluating the projects and Postgres’ development process seemed much more mature — methodical, careful, and focused on correctness. Boring, which I loved.
  I didn’t need whatever perf advantage MySQL had so I switched to Postgres and never looked back. And then the Oracle drama and Monty’s behavior around it — not saying he was wrong or right, but it was the opposite of boring — just reinforced my decision.
  I like to play with new tech in various spots of the stack, but for filesystems and databases I go boring all the way.
- paulryanrogers 3 days ago
  
  I've never lost data with PostgreSQL. MySQL had enough data loss bugs and foot guns that I ran into a few of them.
- moogly 3 days ago
  
  > I've always wondered why Postgres is so insanely popular
  Real answer: no licensing cost
- vbezhenar 3 days ago
  
  For me Postgres is 100% predictable and reliable. It's neither clunky nor arcane in my experience. I don't need to think about it, I just SQL it and that's about it. It quietly works in the background. At some scale there might be some issues, but there is always known path to solve things.
- DonHopkins 3 days ago
  
  Because it's not tainted and cursed by Oracle, like MySQL (and Oracle).
  
  immibis 3 days ago
  
  That's what MariaDB is for, right? I'm surprised to hear people recommend the Oracle fork of MySQL (still called MySQL because they own the trademark) rather than the original project (now called MariaDB)
aorloff 3 days ago

A shudder went through my body hearing that, I had forgotten all about Slony.

Havoc 4 days ago

The risk/reward ratio of fucking with something that works perfectly fine as is is not great.

So for fresh installs yes but existing ones not so much

cedws 3 days ago

That and capitalism doesn’t reward slow and steady, keeping things stable and well oiled. It rewards delivering shoddy features at break neck speed even if they need to be recalled after two weeks. That’s what Wall Street and the shareholders want. Hence why senior management rarely cares about addressing tech debt.
“Show me the incentive, I’ll show you the outcome”

justin_oaks 4 days ago

My upgrade policy for everything:

Significant security vulnerability? Upgrade

Feature you need? Upgrade

All other reasons: Don't upgrade.

Upgrading takes effort and it is risky. The benefits must be worth the risks.

occz 3 days ago

Upgrading when multiple versions behind is significantly more risky than doing it when the update is relatively fresh.
Additionally, actions done frequently are less risky than actions done rarely, since you develop skills in performing that action as an organization - see high deployment frequency as a strategy of managing deployment risk.
This adds up to continuous upgrading being the least risky option in aggregate.
- kortilla 3 days ago
  
  Not if software regressions are the main concern.
- ttfkam 3 days ago
  
  Upgrading from v11 to v16 is not materially different in Postgres from v14 to v16. Same tools. Same strategies.
  
  enraged_camel 3 days ago
  
  We are planning to upgrade from 11 to 17 soon. Even thinking about it is giving me ulcers. Our infra provider said we actually need to upgrade to 13 first, and then to 17. They did not provide a reason.
  
  Tostino 3 days ago
  
  I went through a postgres 10 > 16 upgrade recently. What made it easier was just doing a test run of the upgrade process.
  Did a restore to a stage environment, worked on my upgrade scripts until I was happy (deployed to VMs with ansible, so manual work to write the upgradeprocessfor me), restored again and ran the upgrade process fresh, and then tested my application, backup scripts, restores, etc. Had everything working entirely smoothly multiple times before pulling the trigger in production.
  No stress at all when we did it in prod.
  
  ttfkam 3 days ago
  
  Yep, that was our strategy as well: just keep iterating until the script run cleanly from start to finish without errors.
  
  fillest a day ago
  
  A personal warning about 17.0 if you use streaming replication: secondary replica leaks memory quite actively. 16.4 is OK.
hinkley 3 days ago

Once your version doesn’t receive security fixes you’re one CERT advisory away from having your whole week pre-empted by an emergency upgrade.
I’ve been there with products that were still internal at the time. I can only imagine how much fun that is with a public product. But then I do have a pretty vivid imagination. We changed to periodic upgrades after that to avoid the obvious problem staring us in the face.
natmaka 3 days ago

Suggestion: add "End of life (no more maintenance for this version)? Upgrade"
- Gormo 3 days ago
  
  Why? If the implemented featureset meets your needs, and there are no unresolved bugs or security vulnerabilities relevant to your use cases, what further "maintenance" do you need?
  
  abraham 3 days ago
  
  When a critical security patch comes out, you don't want to have to to do a major version upgrade to get it.
  
  FearNotDaniel 3 days ago
  
  Because when the maintainers have stopped patching that version against all known security vulnerabilities, that doesn't stop the bad guys from looking for more vulnerabilities. When they find one, it will get exploited. So you either wake up to an email from Have I Been Pwned to say all your customer data has been exfiltrated [0], or (if you're lucky) you have a mad scramble to do that update before they get you.
  [0] Probably including those passwords you didn't hash, and those credit card numbers you shouldn't be storing in the first place because, what the heck, it meets your needs.
throwaway918299 3 days ago

Here’s another reason to upgrade: your version is end of life and your cloud provider forced it.
Thank you Amazon!
- mkesper 3 days ago
  
  Yes, this is actually a good thing and comes with warnings beforehand.
  
  throwaway918299 3 days ago
  
  I agree. It helped me completely bypass any discussion from management about “not high enough priority”. Amazon definitely did me a favour in many ways.
Gigachad 3 days ago

Eventually you get forced to update it when the other stuff you use starts having minimum version requirements.

buro9 3 days ago

Upgrading isn't automatic.

Let me check what I'm on... brb... Postgres 14.

Because it's not automatic I leave it, I leave it until it's so unsupported that I must upgrade the whole system, then I build a new system with a new Postgres and I migrate the old to the new.

I want, so badly, for Postgres to just automatically update itself, that a new binary just works with the data directory of an old version, and that if required it does an in-place upgrade to those data files when it can (i.e. if it can detect the last version was the same major as the current version, upgrade the files transparently to the admin).

My databases are all backed up each night, and these are single server Postgres with no replication or other trickery, an automatic upgrade for a single-server Postgres should be possible.

As it's not done... I assume (incorrectly?) that there be dragons and risks, and I mitigate that by never upgrading, just waiting and migrating. Migrating puts all of the risk on me, human error, and I am definitely fallible, so I can best handle this risk by just not doing it until I must.

Last migration I performed was from Postgres 7 > Postgres 14 in October 2021... I guess I have quite a few years of Postgres 14 ahead of me still. I would take downtime, my systems aren't zero downtime critical, just give me a super simple one-command no-questions upgrade as I really dislike migrations.

ttfkam 3 days ago

Unless you want support for the MERGE command (v15), bidirectional replication (v16), and some behind the scenes optimizations, you're not really missing anything.
Whether you go from v14 to v17 or v14 to v20, I doubt it'll make a difference in migration strategy. You've still got a fair amount of time before EOL. No need to stress or FOMO.
justinclift 3 days ago

If you use Docker, then the pgautoupgrade project might be your kind of thing:
https://github.com/pgautoupgrade/docker-pgautoupgrade
- icedchai 3 days ago
  
  Thank you for this! I was looking for an "easy" way to upgrade a dev container DB.
  
  justinclift 3 days ago
  
  Thanks. There's a small group of us (PG DevOps type people) who have been working on it for over a year now, and it's come together pretty well.
  It doesn't yet automatically upgrade people's PG extensions, but that's on the ToDo list and has initial code in a PR. So that'll likely start happening in a few weeks too. :)
pelagicAustral 3 days ago

I feel like I've been on 14 for an eternity... everything is running smooth... I reckon I should upgrade a few of the servers... oh well..
Tostino 3 days ago

I used to do the old pgdump for backups, but after using pgbackrest at work for years I see no reason not to use it for every cluster I want to back up.
I like pointing it at an S3 bucket so I have another full backup repo (with its own retention strategy) away from my homelab, as well as to my local NAS.
Symbiote 3 days ago

If you're using Debian or Ubuntu, investigate 'pg_upgradecluster'.
- teddyh 3 days ago
  
  A full example procedure is documented in /usr/share/doc/postgresql-common/README.Debian.gz, under the heading “Default clusters and upgrading”.

chasil 4 days ago

In Oracle, ALTER TABLE MOVE in 8i was a godsend, finally enabling a table reorganization without export/import.

My timid management forbade an upgrade from Oracle 7.3.4 until 2013. It was agony to remain on that museum piece for as long as we did.

I am upgrade-minded, but my management is not. I always lose.

I am retiring in two years. I will not miss their problems, not at all.

Edit: Oracle 10g was the last release that (for us) brought must-have features. Sure, upgrading to 19 or 23 would be great, but it doesn't bring anything that I really want.

icedchai 3 days ago

Impressive! What platform were you running that on?
- chasil 3 days ago
  
  HP-UX 10.20, which went out of support in 2002.
  
  icedchai 3 days ago
  
  Sorry to hear! I use to work with HP-UX and it was not my favorite.
cafard 3 days ago

Seriously, Oracle 7.3.4 in 2013? I am impressed.
- chasil 3 days ago
  
  I still have one of the HP-UX systems that ran it in the data center; I could start it up if I wanted.

zahlman 2 days ago

>Postgres 17.0 has been out for a bit and it’s awesome, but here’s the reality: most Postgres users won’t upgrade right away. Most probably aren’t even on 16.4 or 16.anything —they’re probably still using Postgres 15 or an even older version.

So, by "ancient", this author means possibly not even two years old, as Postgres 15 is barely that old (https://www.postgresql.org/about/news/postgresql-15-released...).

Meanwhile, I've had to make posts this year to tell people no, seriously, stop using Python 2.7.

And meanwhile in my package repo:

    $ apt show postgresql
    Package: postgresql
    Version: 14+238
    Priority: optional
    Section: database
    Source: postgresql-common (238)
    Origin: Ubuntu
    Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>

This is Mint 21.3 btw.

The author's expectations are incredibly unrealistic.

olavgg 3 days ago

The PostgreSQL team releases a data incompatible format every year. Do they really need to break the data format every year? In my opinion, the release model for PostgreSQL should be refactored. Breaking stuff is no fun for users, please stop doing that if you are a software developer. And if you have to, make it autonomous for the user.

ttfkam 3 days ago

If you want performance to steadily increase rather than steadily decrease, the answer to your question is yes. That doesn't mean tools couldn't be better, but there are clear technical advantages to their on-disk format updates.

yen223 3 days ago

Databases tend to be "stickier" than other parts of any large software system. Largely because database migrations are costly. You can't just tear down an old database and rebuild a new one, you have to figure out how to move all that data across too.

The consequence is that things in database-land tends to move slower than other types of software. This I think is the major reason why we still use SQL.

jart 3 days ago

Have postgres updates actually been requiring users do migrations? Or is this just a fear that something will go wrong?
- magicalhippo 3 days ago
  
  From what I can gather, yes[1]:
  A dump/reload of the database or use of the pg_upgrade application is required for major upgrades.
  Now, seems one can mostly use pg_upgrade, which only rewrites the system tables[2] so is fairly quick. But if on-disk format has changed it seems you're forced to dump and reload.
  At work we've mainly been using SQLAnywhere, which would just disable new functionality for databases using old on-disk format. So upgrading major versions has usually been fairly painless and quick.
  [1]: https://www.postgresql.org/support/versioning/
  [2]: https://www.postgresql.org/docs/current/pgupgrade.html
- bc_programming 3 days ago
  
  Well if it's self-hosted you have to do it yourself. You can either backup your databases from the old version and restore it to the new version once installed, or you can use pg_upgrade to upgrade/copy a old version data directory to the new version.
  I don't think this is done automatically when you simply install a new postgres version, but I'm not certain of that.
polishdude20 3 days ago

What's the SQL alternative?
- aitchnyu 3 days ago
  
  Hopefully Edgedb. It could make most ORMs and heavyweight web frameworks redundant, maybe some BI tools and jobs too.
  https://www.edgedb.com/
  
  adhamsalama 3 days ago
  
  Looks kind of like PRQL https://github.com/PRQL/prql
- p10_user 3 days ago
  
  JSON - er JSON-based document storage - documents with unique identifiers. and the ability to define and set schemas for the JSON, and ... we're back to a relational database
  
  ogogmad 3 days ago
  
  I think he was talking about the query language.

webprofusion 3 days ago

Lol, try upgrading old MongoDB stuff.

Database engines (every single one) are notorious for incompatibilities between major versions, upgrading mission critical stuff means updating and re-testing entire applications, which in some cases can be a multi-million dollar process, before going into production.

Even if you deeply know/think that there's no problem upgrading, if something does fail in production after an upgrade and it's mission critical..

x-_-x 3 days ago

This. I was tasked with upgrading Postgresql from a very old version (I think 9?) to one that was still supported a couple of years ago. Backwards compatibility is paramount and from my experience upgrading MySQL/MariaDB I know that changes in versions can break it.
For this reason, I chose to upgrade to version 11 because it was only a couple of versions apart and still had repositories available at the time.
So the first thing I do is stop the VM and take a snapshot. Then I start it back up and go check for database corruption before I dump them... wait there's no utility to check for corruption...? Yep that's right! You basically have to YOLO the whole thing and hope it works. OK...
So I dump the databases and back up the directory. I shut down the old version and then install the new version from the repo. I start to import the databases and notice in the scrolling logs that there's some incompatibility... Oh F*$&. I google the error and spend a good hour trying to figure it out. Apparently there are external plugins for postgres that were installed in the old version. I search for the plugins online and they are long discontinued. OK, so let's just copy them over from the old version. I stop postgres, copy them over, and start it back up. It starts up ok. I reimport the databases and no more errors. Yay! I start the applicationsand pray to the SQL gods. So far so good, everything seems to work.
Thankfully the applications tested well and worked post upgrade.
All of this was done for a critical multi-million dollar healthcare platform. There were no official guides. Nothing. I had to find some random admin's blog for guidance. MySQL on the other hand has full documentation on just about every aspect of everything. The whole process was super hackish and not having any way to check database integrity would have been a show stopper for me had I designed this configuration.
- teddyh 3 days ago
  
  > there are external plugins for postgres that were installed in the old version. I search for the plugins online and they are long discontinued.
  You can hardly blame PostgreSQL for that.

Netcob 3 days ago

My personal reason: While I haven't had to deal with a Postgres update at work yet, I've been running a pretty large Postgres 12 database in my homelab for a few years now.

My homelab projects mostly center around a "everything is an MQTT message" idea. Zigbee sensors, Tasmota power readings, OwnTracks locations, surveillance camera events, motion sensors for light switches, currently active app on my PC, status of my 3D printer, whatever my vacuum robots are up to and so on. It all gets recorded into a Postgres db. From there I can use it for data mining experiments, but mostly as a source for Grafana. I tried counting the rows but that query didn't even complete while I was writing this comment.

I like trying out all kinds of dockerized oss services, and I keep them updated using watchtower. I run a gitlab instance which is usually the most annoying service to update because it there's an upgrade path and post-start-migrations. With my Postgres instance, which is isolated from the internet, I'll have to figure out what the fastest way is to move all that data around, not leave a huge gap in the record and so on. Sounds like at least a day of work - and since it's technically all for "fun", it'll have to wait until it actually is that.

Symbiote 3 days ago

A good approach for this is to use pg_upgrade in-place, which should give you a downtime of a few minutes at most. (I have 800GB at work and would expect 1-2 minutes for this.)
I recommend installing PG12 on a temporary VM, duplicating the existing database, and test the upgrade in isolation.
https://www.postgresql.org/docs/current/pgupgrade.html
A more complicated approach uses replication, and upgrades the standby server before promoting it to the primary server.
- Netcob 3 days ago
  
  Thank you! Looks like the best way to do this.
  And since I have backups, I might not even need the testing step, considering the low risk. Might do it anyway just out of curiosity at how long it would take to duplicate.

dwwoelfel 3 days ago

Here's how we did it at OneGraph (RIP), where we not only upgraded versions without downtime, but we also moved hosting providers from GCP to Aurora without downtime.

1. Set up logical replication to a new database server. We used https://github.com/2ndQuadrant/pglogical, but maybe you don't need that any more with newer versions of postgres?

2. Flip a feature flag that pauses all database queries and wait for the queue of queries to complete.

3. Wait for the query queue to drain and for replication to catch up.

4. Flip a feature flag that switches the connection from the old db to the new db.

5. Flip the flag to resume queries.

It helped that we were written in OCaml. We had to write our own connection pooling, which meant that we had full control over the query queue. Not sure how you would do it with e.g. Java's Hikari, where the query queue and the connection settings are complected.

We also had no long-running queries, with a default timeout of 30 seconds.

It helped to over-provision servers during the migration, because any requests that came in while the migration was ongoing would have to wait for the migration to complete.

droobles 3 days ago

That is awesome, I dream of being able to do zero downtime SQL migrations.
- brentjanderson 3 days ago
  
  One of the linked pieces in the Neon blog post is from Knock, where we pulled off a practically zero downtime migration: https://knock.app/blog/zero-downtime-postgres-upgrades
  In that post we walk through all the steps we took to go from Postgres 11.9 to 15.3.

KronisLV 3 days ago

In many orgs out there, the version that's picked when a project is started will stick around for a while.

Suppose you join a project and see that the PostgreSQL version used is pretty old.

Do you:

  A) convince people to migrate it for some nebulous benefits (features that aren't currently used, performance improvements that will be difficult to measure in lieu of tooling and aggregated statistics) while also taking on the risks of breaking everything or even worse, data loss (since you won't always be able to provision multiple instances and backup restore might take a while and you could still mess that up)
  B) or just leave it as it is and focus on anything else

Many will prefer to not get burned even if there shouldn't be that many risks with upgrading your average PostgreSQL install, which is why you'll get minor/patch releases as best, alongside whatever the runtime environment is getting upgraded.

Containers and bind mounts make all of this way easier, but then again, many places don't use containers.

gonzo41 3 days ago

So i've got a small but important app that I run on PG14 via a container. I literally just put the pgdata dir outside the container and I can keep the host OS totally fine and have an isolated db environment. it's a very nice way to not worry about the db.
- KronisLV 3 days ago
  
  Some might disagree (believing that the system package manager should be responsible for the PostgreSQL install), but I agree that the approach you’re using is one of the better ones.
  That way you can update the database version when you want, while still installing the base system updates quite frequently. Plus, I think it’s really nice to separate the runtime from the persistent data, which such setups make obvious.

Narkov 3 days ago

> Postgres 17.0 has been out for a bit

From the same blog:

> Sep 26, 2024 - Postgres 17 is Now Available

3 weeks....for a new major release...and we are asking ourselves why people haven't updated?

jeltz 3 days ago

Yeah, upgrading to PostgreSQL 17 now would be weird unless you have some very specific feature you need in it and spent resources testing your application on the betas and rcs.
- majewsky 3 days ago
  
  My team has upgraded several dozen databases from 16.x to 17.3. Went entirely smoothly. The thing is that we're running on a process of upgrading all dependencies every Friday, and then promoting to prod on Monday unless there are specific issues, so our definition of "would be weird" is the reverse from what you say.
  (Granted, we have rather small DBs and simple applications where ON UPDATE SKIP LOCKED is about the most fancy feature we use.)
  
  fillest a day ago
  
  Be careful if you use streaming replication - secondary replica leaks memory on 17.0 (16.4 is fine)
  
  Narkov 3 days ago
  
  17.3? That doesn't exist (yet).
  
  majewsky 3 days ago
  
  Oh sorry, I went from memory and mixed up the numbers in "16.3 -> 17.0".

imbradn 2 days ago

Often lost in these discussions is how much more difficult upgrading is at scale. The article talks about the challenges with upgrading a 4TB database. In my world - that’s a small database.

Trying to setup a logical replica of a much larger, high write volume database is an entirely different story with its own set of challenges. In some cases it’s not even possible to do even with tricks like dropping and restoring indexes.

Logical still struggles to keep up with high write loads. When something like vacuum freezes kicks off it’s not uncommon to see logical replication lag for significant periods.

Then there are things like lack of DDL replication. While this can be mostly worked around - it adds complexity. And remember DB user management is DDL - so if you change a DB password on the primary it won’t replicate to the logical replica.

When you have CDC systems using logical decoding from the systems you are upgrading you have to deal with resync conditions related to the fact that the logical replication slot will not be replicated and the new replica will lose its place in the replication stream.

Most non-trivial instances have multiple streaming replicas for read offloads which need to be coordinated at cutover. While not difficult it increases the complexity.

In addition - there are no guarantees of application compatibility. While this is rarely an issue in my experience- PG 14 (IIRC) changed the function signatures of a bunch of array functions which was particularly painful.

That said - Postgres is improving the story here. PG 17’s ability to convert a streaming replica to a logical one and be able to upgrade it will be a game changer for standing up the logical replica. If they can get DDL replication into logical it will improve things even more.

That said - it’s still way behind how modern databases like CockroachDB and Yugabyte handle things here.

xpasky 4 days ago

Related...

  postgres    1958  0.0  0.0 247616 26040 ?        S    Jul21   3:03 /usr/lib/postgresql/11/bin/postgres
  postgres 1085195  0.0  0.0 249804 24740 ?        Ss   Aug19   2:01 /usr/lib/postgresql/13/bin/postgres
  postgres 1085196  0.0  0.0 223240 27900 ?        Ss   Aug19   1:59 /usr/lib/postgresql/15/bin/postgres

Postgres is the only thing on my Debian that doesn't seamlessly automatically upgrade across dist-upgrades, but instead leaves old versions around for me to deal with manually... which I seem to never get around to.

heavyset_go 4 days ago

That's because you install versioned packages like postgresql-15: https://tracker.debian.org/pkg/postgresql-15
That way you can have multiple versions of the same package.
- xpasky 3 days ago
  
  You mean there is some other way to install postgresql in Debian?
  
  heavyset_go 2 days ago
  
  No, what I meant is that the install path for Postgres on Debian involves installing versioned packages. It's the only approved way of installing Postgres from the Debian repos that I'm aware of.

Apreche 4 days ago

Because upgrading is a lot of work, and is higher risk than upgrading other software.

kevin_thibedeau 4 days ago

Seems like a massive design fail if they can't maintain backwards compatability and provide a safe, low friction upgrade process.
- ggregoire 3 days ago
  
  I think it's more about avoiding downtime (I just upgraded a pg with 1TB of data from v11 to v16 and I didn't notice any breaking changes). In an ideal world, every client of the DB should be able to handle the case where the DB is down and patiently wait for the DB to come back to keep doing its job. But from my experience, it's rarely the case, there is always at least 1 micro service running somewhere in the cloud that everybody forgot about that will just crash if the DB is down, which could mean losing data.
- aseipp 3 days ago
  
  Related, but Sandstorm is an app-hosting platform/solution that's very different in design than other solutions like Docker, and one of the reasons is that it's actually designed from the ground up for easy and effective use on homelab-style setups, because among other things (not limited to this, and 100% from memory):
  1. Sandstorm apps need to always come back cleanly from a raw SIGKILL at any moment.
  2. Sandstorm apps must be able to always upgrade ANY previous version of stored data, with no intervention, automatically, when newer versions are deployed.
  These are tough technical constraints and have huge implications on the whole design. For example, the client/protocol layer needs to possibly be multi-version aware for clean rollouts, client connection failover needs to be built in, etc. But the reality is if you do not have these two constraints, your software will never really work in a turnkey forget-about-it-way way where someone just runs an instance and then actually forgets about it, and it can be safe and secure. This kind of stuff is imperative to every actual computer user who isn't a programming nerd.
  This is why slapping Docker on a lot of existing pieces of software like databases doesn't really work in the grand scheme. Yes, it gets you started faster, which people value to a high degree. But it often lacks or has the same problems as other solutions on day 2, day 3, ... etc. So, you need to solve all the same problems anyway just in a different form (or externalize them onto some other solution.)
  Realistically, the solutions have to be designed in from the ground up. That's very difficult to do but necessary.
  Someone else mentioned in here that SQL Server always 100% works when upgraded in place from an old version. That's also my (limited, homelab-ish) experience and a good reason to like it.
  
  kentonv 3 days ago
  
  Huh, I never really thought of that as a unique property of Sandstorm, but you're absolutely right, this is essential to any software intended to be operated by a non-technical end user, and Sandstorm is aggressive about it (e.g. apps are always shut down by abrupt SIGKILL and whenever someone requested that we add some sort of clean shutdown notification, I refused).
  Ironically, Sandstorm itself is still stuck running on Mongo version 2.6 (a decade old!) because Mongo can't necessarily be updated in an unattended way. Of course, we could never ask every Sandstorm user to do some manual process to upgrade it.
  Meanwhile, newer versions of the Mongo Node.js client library don't support such old Mongo, which means Sandstorm cannot actually update that package nor any of the packages that depend on it anymore. And this is why Sandstorm is now stuck in time and no longer receiving updates. :(
  It was clearly a big mistake for Sandstorm to use Mongo. We chose it because we built the UI on Meteor, which was really only designed to work wing Mongo. In retrospect I wish we'd used SQLite, whose backwards-compatibility guarantee has now reached twenty years.
- voidfunc 3 days ago
  
  95% of projects are design fails in this regard... few projects think about Day 2 Ops
forinti 3 days ago

I find that upgrading Postgresql is really easy.
Testing all the apps that use it, not so much.

dboreham 3 days ago

Quick note that the article is talking about why folks are on versions of PG that are not ancient (14, 15).

craigds 3 days ago

right? we're on 13 and only now starting to consider upgrading to 16, and I don't think we're very abnormal and I don't consider 13 ancient. We have customers still using 9.3. (That last one does actually count as ancient.)
- briffle 3 days ago
  
  Same exact boat. Logical replication is nicer than our 'in place' upgrades we did before, but still leaves lots of other issues.
  We are constantly making changes to our schemas (adding tables, columns, etc). Its never an issue on physical standby's, it just gets created, but logical replication, we have to manually run the changes on the subscriber.
  We have lots of instances where we create a new table for a feature coming, and alter another to add a column.
  If those get missed on the logical subscriber, you don't even know until someone tries to write data to that new table or new column.
  I know logical is supposed to be flexible, but I wish there was a setting to have a replica via logical, so I can handle upgrades easier.

nomercy400 3 days ago

For me it is a data vs code thing.

If I run my application/code v1 right now, I generate data. I expect that if I move to application/code v2, I can leave my data in place and it will automatically apply changes to my data.

I do not get that with postgres. If I am on postgres 16, and I want to upgrade to postgres 17, I want to leave my data folder untouched. When I then start postgres 17, it should just work (tm).

It should also work with code that assumes postgres 16, so I can upgrade my database separate from my application. I can not wait 10 days for a large database to be migrated from 16 to 17 without being able to run it. However, I can wait 10 days before updating my code to support features in 17.

The current upgrade process does not give me such confidence in restoring data and uptime. So I don't upgrade until I really have to.

vivzkestrel 3 days ago

i ll tell you why from my end. I installed Postgres14 via homebrew many years ago on my Apple M1 mac mini. I searched a lot on how to "upgrade" this installation but found nothing. I have a few databases running with data on it which I can't afford to lose if something goes down for more than 1 hour. I wish someone would guide me on how to actually install a newer postgres such as v17 without breaking an existing v14 install or losing data since I am not an expert by any means

sofixa 3 days ago

And this, kids, is a good example of why homebrew is not a real package manager, and why macOS is not the best at running production services.
I hope you have a backup somewhere else, not on the Mac Mini. One option would be to restore it on another machine on Postgres 14, follow the upgrade path to 17 (idk if you can jump directly but a few Google searches will cover this), verify the data, back up the new db, somehow upgrade Posgtres on your Mac, and restore the new back up. Done.
- mxey 3 days ago
  
  I don’t see how this is homebrew‘s fault? Homebrew lets you install multiple versions of PostgreSQL at once, which you need for pg_upgrade, just like you can do with APT
  brew install postgresql@17
thrdbndndn 3 days ago

Forget about something as major as Postgres, I have trouble updating packages (that aren't install via pip/npm/cargo) on Linux all the time as a newbie. The experience is worse than Windows for some reason.
Hell, I have a hard time to tell the version of some system build-in binaries.
A few months ago, I have trouble to unzip a file which turns out ot be AES-encrypted. Some answers on SO [1] saying I should update my `unzip` to newer version but I can't find any updates for my distro, and I have no idea (still no, so feel free to teach me) to update it manually to make my `unzip` supporting AES. And all the versions, the good and the bad, all say they're "version 6.0.0" despite they behavior obviously differently.
[1] https://stackoverflow.com/questions/60674080/how-to-open-win...
- sofixa 3 days ago
  
  > I have trouble updating packages (that aren't install via pip/npm/cargo) on Linux all the time as a newbie. The experience is worse than Windows for some reason
  If you haven't installed them via your programming language's package manager, you either installed them manually or via the OS package manager. The first one you'd know how to upgrade, and for the second you can ask it what version it is and what version is available to upgrade to (for compatibility reasons it might not be the latest, or latest major, unless you use the software vendor's own package manager repositories).
  It's actually much easier than in Windows, because you have a piece of software (package manager) that is your one stop shop to knowing what is installed, update it, check versions available, etc. unless you've manually installed stuff.
  In Windows you... google and download random .exes? Cool. As good as the worst possible option on Linux.
  
  reshlo 3 days ago
  
  > In Windows you… google and download random .exes?
  Windows has an official package manager now too.
  
  sofixa 3 days ago
  
  And there's Microsoft provided and supported software in it?
  
  reshlo 3 days ago
  
  Yes.
  https://github.com/microsoft/winget-pkgs/tree/master/manifes...
  
  thrdbndndn 3 days ago
  
  Not exactly random. It's not hard to tell which website is official 7-zip website. Also choco and scoop exist on Windows.
  > As good as the worst possible option on Linux.
  I understand this is not a fair comparison, but in practice, they're not as easy. When using Windows, I usually use it with a proper GUI interface, so popping up a browser and download the newest installer for a software from their official website would take me less than 1 min.
  Doing similar for my Linux VPS with only a terminal is much more complicated.
  
  KronisLV 3 days ago
  
  > Doing similar for my Linux VPS with only a terminal is much more complicated.
  Debian/Ubuntu:
  sudo apt update sudo apt upgrade
  Fedora/RHEL:
  sudo dnf update
  Arch:
  sudo pacman -Syu
  Alpine Linux:
  apk update apk add --upgrade apk-tools apk upgrade --available
  Of course, if the package you need isn't available in the standard repos, then you'll need to look elsewhere (e.g. PPAs or third party repos). There's also options like Flatpak and AppImage if you want something that's a bit closer to how you'd choose to install new releases on Windows.
  If I wanted to update all of the installed software I have on my Windows install, there'd basically be no way for me to do this, outside of shady update manager software.
  At the same time, I get the appeal of being able to just download a new release and install it, both AppImage on Linux and the way you install software on macOS (just drag the file into Applications) seem similarly pleasant to me in that regard.
  To expand on the latter (the response got deleted), you can very much do something like https://peazip.github.io/peazip-linux.html (I wish 7-Zip was available on Linux natively, but as far as GUI software goes, PeaZip is pretty nice) however that's not the most common approach. You should generally prefer using the package manager when you can.
  
  thrdbndndn 3 days ago
  
  Sorry I wasn't very clear, it's totally on me.
  On average, the experience of upgrading/managing packages is obviously much better than Windows.
  I meant to say in certain cases (like the `unzip` example I mentioned above), when the system's build-in package manager fails, I seem to not be able to find alternatives like what I did on Windows (just find the piece of binary I want and manually install it). I to this day still can't find a way to update `unzip` to a version that supports AES on my Debian VPS.
  
  MaKey 3 days ago
  
  > I to this day still can't find a way to update `unzip` to a version that supports AES on my Debian VPS.
  Maybe because there is none? I quickly googled and found this bug: https://bugs.launchpad.net/ubuntu/+source/unzip/+bug/220654
  For archives encrypted with aes-256 p7unzip-full can be used.
  This is not a Linux only issue though, the native Windows unzip tool also doesn't seem to support aes-256 (yet): https://answers.microsoft.com/en-us/windows/forum/all/how-do...
  
  thrdbndndn 3 days ago
  
  https://stackoverflow.com/questions/60674080/how-to-open-win...
  The author in this answer clearly has a version of unzip that can detect "AES_WG". Unfortunately they only vaguely said (in one of the comment) "Since then the main Linux distros have added patches to fix various issues" and didn't specify which distro.
  
  MaKey 3 days ago
  
  He also says:
  > Your best bet is to yry 7z to uncompress the zip file with AES encrypted entries.
  So why not just do that and call it a day?
  
  thrdbndndn 2 days ago
  
  Because this does not only happen to unzip and I want to find a solution in general.
  
  MaKey 2 days ago
  
  What do you mean exactly with "this"? The upstream unzip doesn't support AES_WG. How should a general solution look like?
  
  thrdbndndn a day ago
  
  Find a working binary (or source code, then compile one) and replace it manually. Is it not possible?
  
  MaKey 3 days ago
  
  > Doing similar for my Linux VPS with only a terminal is much more complicated.
  sudo apt-get install p7zip-full
  
  thrdbndndn 3 days ago
  
  I'm replying to the "the worst possible option on Linux", i.e. when the said software is not available in package manager. 7-zip is just a (bad) example; since you can install 7-zip using `choco install 7zip.install` on Windows too.
  I meant to say when you can't find the software you want in package manager, it's easier to download it manually and install it on Windows than (again, unfair comparison) a terminal-only Linux server.
  
  MaKey 3 days ago
  
  > I meant to say when you can't find the software you want in package manager, it's easier to download it manually and install it on Windows than (again, unfair comparison) a terminal-only Linux server.
  In that case you would just copy the download link and paste it into your terminal session. It's rarely needed though as most software is available through your distribution's software repositories.
  
  reshlo 3 days ago
  
  > choco and scoop
  And winget.

Klonoar 3 days ago

This kind of sounds like a Homebrew issue…?

I’ve run multiple variants of PostgreSQL on my MacBook under MacPorts. Can’t even remember the last time I thought about it, to be honest.

vbezhenar 3 days ago

    % brew search postgres
    ==> Formulae
    check_postgres      postgresql@12       postgresql@15       postgrest
    postgresql@10       postgresql@13       postgresql@16       qt-postgresql
    postgresql@11       postgresql@14       postgresql@17       postgis

Looks like "user issue" to me.

ellisv 3 days ago

I enjoyed this thread (and the article) as I'm preparing to upgrade several Postgres databases from 14 to 16, which should take about 25 minutes or less.

My experience has been that most people (including devs) hardly think about their database at all. Everyone just takes the database for granted until a dev runs a bad query that locks a table. Devs let their ORM manage everything for them and don't take the time to think for themselves.

Also I rarely see teams with a culture that prioritizes maintenance, even for their own software. It's common for teams to become wary of upgrading because they do it infrequently. Like most things, the more often you perform an upgrade the easier it is. Smaller changes are easier to revert. The more often you do something, the better at it you get. Delaying maintenance often just increases the work later.

pronik 3 days ago

I'm still on 9.x in some systems and it's running great. I find this kind of sentiment a bit weird anyway: PostgeSQL 17 has been out for a couple of weeks, I'm certainly not in a rush to upgrade anything unless I need to. Never touch a running system is as valid as ever and on top of that I'm not a full-time DBA itching to upgrade as soon as possible. With containerization it's also more common to have multiple and right out many DB instances, I won't be going through all of them until someone requests it. Security updates is a completely different matter, but major versions? Don't get the rush, the developers will come around sooner or later.

cpburns2009 3 days ago

Why? Because it's risky when a lot of business critical data and processes rely on it. I'm just happy I was able to convince my boss to let me upgrade our 9.1 instance to 14 two years ago. Now to upgrade those two pesky 9.4 instances.

arend321 3 days ago

I guess I'm lucky and 10 minutes of downtime is not a problem for my customers during off-peak hours, every year or so. Upgrading has always been a painless experience with just a pg_dump and pg_restore on the upgraded db server.

c0balt 3 days ago

Ancient version like 15 or 16? I have recently encountered a PostgreSQL 12 instance.

Once the database runs you are inclined to let it keep on running until you must do an upgrade (Bugfixes, EOS) . Upgrading for new features/ improved performance is nice but can be a pain and is prone to cause downtime. PostgreSQLs upgrade tolling story is also a bit painful but it has proven to be an awesome way to semi-regularly test my backup infrastructure.

phendrenad2 3 days ago

Why don't people upgrade? Why don't we ask why upgrading is necessary? I understand that Postgres is free software, and if you choose to use it, you're signing up for whatever upgrade cadence the developers give you. But really, I with the developers would spend a bit more time patching "old" versions. 6 years is really too short a time to have to do an upgrade, especially if you have many database servers.

liampulles 3 days ago

Our team uses AWS managed postgres, and the ease of operations and upgrades really alleviates stress for us. Managed DBs are worth the cost.

water9 2 days ago

Because if it isn’t broken, don’t fix it and nobody Wants to be responsible with the production database going down for any significant period of time, Or worse loss of data

DonnyV 3 days ago

One of the many things I love about Mongodb is the upgrade process. Shutdown server, delete old exes, copy new ones and start server. Done

Any needed updates to databases are done by the new exe, automatically. Obviously backup everything before you do the upgrade. I've never had one fail.

mebcitto 3 days ago

If there is anyone from Neon watching this thread, is there a way to suggest updates to the pgversions website? It currently pins Xata to 15.5, which is true for the shared cluster environment depending on the region, but one can start dedicated clusters up to 16.4 at the moment.

i_am_jsd 3 days ago

[dead]

yobert 3 days ago

I have a large production deployment that is still on 9.6 because the software depends on table inheritance. (Oh man!)

mnahkies 3 days ago

Unless I'm mistaken table inheritance is still a thing in current PG versions, in terms of partitioning at least it's just less commonly used in favour of declarative partitioning since it's easier to manage.
It's been a long time since I worked with v9.x in anger, so I could well be forgetting things though
https://www.postgresql.org/docs/current/ddl-partitioning.htm...
- yobert 3 days ago
  
  Our problem isn't the inheritance feature itself-- it's the removal of the config knob `sql_inheritance` which lets you change whether inherited tables are included in queries by default or not (behavior controlled manually by the * notation). It's a goofy behavior nobody liked, but we have a mountain of SQL that depends on that knob. It's fixable... Just a pain :)
forinti 3 days ago

For a while I had a separate version 9 cluster because of one application.
But it wasn't really a nuisance. Postgresql doesn't give a DBA much to worry about.

redslazer 3 days ago

Funnily enough neon does not offer an easy way to upgrade database compared to other managed database hosting.

andrenotgiant 3 days ago

(Neon employee) We auto-upgrade minor versions as long as they can be done autonomously. For major versions, you're right it's still manual but we're working on improving that. Here is our version policy: https://neon.tech/docs/postgresql/postgres-version-policy

Ozzie_osman 3 days ago

If you're on something like RDS, major version upgrades are pretty easy with Blue Green deployments. You can do it with just seconds of downtime and pretty low risk because it uses replication under the hood and handles the cutover for you.

TomK32 3 days ago

Ha, I run arch on my dev machine and they just LOVE upgrading postgres. t has always been a pain until my projects all got their postgres in a docker container. I just don't like to do extra steps for the upgrade process between major versions.

lousken 3 days ago

pg15 is not ancient at all and it has been a part of latest debian hence I don't see an issue. Same with python or any other dependency. You rarely need the newest shiny these days and being within the support window is fine

smellybigbelly 3 days ago

I wouldn’t upgrade major versions until the x.2 version is out. New major versions come with new bugs and I’d like to wait two minor versions until I start using the image in production.

Hawxy 3 days ago

As a counterpoint a lot of managed postgres providers offer automatic upgrades. Upgrading Aurora Postgres is very easy and we just schedule it in once or twice a year.

wiz21c 3 days ago

Maybe they don't upgrade because currently installed PG instances are enough? Meaning that Oracle still holds the "big" deployments ?

ZiiS 3 days ago

Because they already work great, you are unlikly to get forced to upgrade, and they are the part of your system doing the most critical work under the heviest load.

mannyv 3 days ago

Any db upgrade is a crapshoot. You need to test -everything-, and even then you may not catch things that run occasionally.

otabdeveloper4 3 days ago

Because it's in a Docker container and nobody ever updates Docker containers.

pjmlp 3 days ago

For the same reason lots of companies are stuck on Java 8, .NET Framework, C++98 / C++11, C89,.....

p5v 3 days ago

It’s always a trade-off of how much you’ll get from an upgrade, vs. how much time, effort, and pain you’ll have to invest to do the switch. Postgres is at a stage, where a single version can easily outlive the duration of the software you’ve built with it. Let’s be honest here, a vast majority of software doesn’t need a fraction of the requirements we think it does, frankly, because no one uses it to the extent that latest and greatest features would make the experience more pleasant.

ldjkfkdsjnv 4 days ago

Honestly, I've aside from React and Java (8 -> 21 is big but still not that big), there's very little software that I updated and noticed a major step change difference in the system. Once it works, its fine

ggregoire 4 days ago

Postgres and mysql usually have changes in each new version that are important enough to motivate an upgrade, whatever it is new features or better performance or both. Although it really depends if your are using the features they are improving or not (e.g. if you don't use partitions, well of course that 30% perf improvement on write operations on partitions won't benefit you).
You can check this article about Uber migrating its Mysql from v5 to v8 posted here 3 days ago [1]. Among other things, they observed a "~94% reduction in overall database lock time." The before/after graph is pretty impressive. It also gave them window functions and better JSON support, which are two very big features.
[1] https://www.uber.com/en-JO/blog/upgrading-ubers-mysql-fleet

erik_seaberg 4 days ago

If PostgreSQL has replication, why are they talking about "minimal" downtime? Is there no quorum strategy that delivers high availability? I don't know as much as I should.

bastawhiz 3 days ago

Writes happen on your primary. At some point, you need to stop accepting writes, wait for the replica to fully catch up, reverse the replication so the replica is the new primary, then direct writes to the new primary. That's hard to do without any downtime.
There's no option where the nodes all accept writes.
- erik_seaberg 3 days ago
  
  https://www.postgresql.org/docs/current/warm-standby.html#SY... mentions quorum-based synchronous replication, which sounds encouraging.
  
  bastawhiz 3 days ago
  
  That still only has one primary: you can't just start pointing writes at the upgraded standbys. Synchronous replication (besides having its own downsides) just shortens the time needed for the replicas to catch up. You still need to perform a cutover.

chistev 3 days ago

Use MySQL

sublinear 3 days ago

corporate friction

linuxandrew 3 days ago

Exactly this. At my old employer we (the sysadmins/operational DBAs) were pushing for newer, supported versions of Postgres. The push back was always the regression testing cycle, amount of desired change and willingness of the product owner to even engage. The testing cycle was so long that I tried to convince them to test on Postgres beta, because it would well and truly be prod with a few bugfix releases by the time testing was completed (alas, they went with n-1 release of Postgres instead).

roenxi 4 days ago

I've always found it fascinating that there is a vocal contingent at HN that seems to legitimately hate advertising. But then an article like this turns up that is obvious advertising and is also a good article - we get a nice summary of what the major performance features over different postgres versions are, and some interesting case studies (I'd never even heard of the NOT VALID option although apparently it is nothing new).

esperent 4 days ago

This is something I've heard called "permission marketing". The idea is that you show genuinely useful ads to only the few people who will benefit from them, rather than indiscriminately blasting millions of innocent bystanders. Then these few people will actually welcome your marketing efforts.
The classic example is advertising a new improved fishing reel in a fishing magazine. People buy the magazine (well, 20 years ago they did) because they want to know about things like new improved fishing reels.
It's a world away from the overwhelming avalanche of bullshit that is modern advertising/spam. There's nothing at all weird about hating advertising in general but being ok with permission marketing.
If you follow this idea further you'll find that very few people, even the most vocal, genuinely hate advertising. We all want to know about useful products and services. We just don't want to see a million ads a day for Apple, Coke, Pepsi, Nike, erectile dysfunction, fake single women in your area, Nigerian princes...
Because when it reaches a certain scale, and when too many psychological tricks are being played, and everything is always, BRIGHT, BIG, hyper-sexualized, when you can't walk down any street, watch anything, read anything, without seeing people richer, smarter, younger, sexier, happier than you, it goes far beyond just advertising. It's brainwashing. It has to stop because it's extremely unhealthy for our societies, our mental health, our children.
- shiroiushi 3 days ago
  
  Well said. If I'm reading about fishing, for instance, an ad for a new piece of fishing gear would not be too annoying, as long as it isn't too intrusive (like popping up in the middle of my reading).
  But when I'm watching a YouTube video, having the video cut mid-sentence to some hyper-annoying and unrelated noisy ad simply angers me and makes me look for an ad-blocker.
OJFord 4 days ago

I rarely see much objection to contentful 'advertising' like this. Anyway, the answer really is that it's fully handled by submission/voting/flagging mechanisms, doesn't matter what anyone might say.
jart 3 days ago

Yes but Neon databases is a funder of Postgres development. So I'm interested in hearing what they have to say. If they're advertising then I think helping open source is the right way to go about it. To me it sounds like they just want to make sure people benefit from all the money they're spending.

indulona 3 days ago

[flagged]

0xbadcafebee 3 days ago

Because we don't really like it to begin with.

The RDBMS has not fundamentally changed much since the 80's. It's the same basic design inside and out, with a shitload of tweaks and optimizations. Don't get me wrong - you can get very far with tweaks and optimizations. But the foundations root you to a particular model, and some things in that model will always be painful.

The important question to me isn't why don't people upgrade. It's why do people run Postgres?

Is there no other kind of database? No, there are hundreds of different databases. Is it because Postgres does things fundamentally different and better than anything else? No, lots of things work somewhat like Postgres, with pros and cons. Is it because you can't do things without Postgres? No, there are always alternatives that you can make work. Is it because it's impossible for it to keep working without upgrading? No, any database with long-term support could continue working without upgrading.

So why use Postgres at all? Answer: it's the incumbency, stupid.

When a big fucking honking giant "thing" sucks up all the air in the room, provides for most of the use cases, and is accepted by all the users, then it's extremely hard to justify not using it. Incumbents usually win, even if they're old, disreputable, buggy, and annoying. Even if they're missing some obvious features other things have. Even if people loudly complain that they want change. It's just more annoying not to use them.

We're used to them. We've adapted. Trying to do something else is going to be annoying and hard. So we stick to what we know. But that doesn't mean we have to stroke the balls while working the shaft. Upgrading isn't exactly fun. It doesn't even buy us anything, other than the promise of "support", or "compatibility" with whatever else will eventually require it. So we upgrade, eventually, when we must.

But the constant mind-numbing march away from entropy isn't a fantastic reason to keep using the thing. When you have to convince yourself to stick with it, it's likely you're in an toxic relationship. If you're honest with yourself, you'll agree that it's time for a change. You deserve better.

But switching is full of unknowns. Ominous. Time-consuming. Hard. You know you want to, maybe even need to. But there's too much to lose. So you stick to the familiar, to what you can handle. Maybe something better will come down the pike soon. You even tell yourself you're lucky to be here. But deep down you know you're just comforting yourself. You wish you had something else. Something that brings you joy. Something better. Something... right. But that something isn't here right now. So until Mr. Right gets here, you'll stick with Mr. Right Now.

Time to get ready for the next upgrade..... sigh...

ttfkam 3 days ago

You think people choose Postgres because it's the incumbent? That's an… interesting take.

TacticalCoder 4 days ago

> Postgres 17.0 has been out for a bit and ...

No. It's been released in September 2024. That's not "quite a bit".

Now as to why people aren't all on 17 and not even on 16 yet, here's an acronym for you: LTS [1]

Debian 11 Bullseye is the current LTS. It came out in 2021.

[1] https://en.wikipedia.org/wiki/Long-term_support

selcuka 3 days ago

PostgreSQL doesn't have a long term support policy [1]. They release a new version around this time every year, and support it for about 5 years.
[1] https://www.postgresql.org/support/versioning/
Izkata 3 days ago

They didn't say "quite a bit" (long time), they said "a bit" (short time).
hairyplanter 3 days ago

Debian doesn't have LTS and non-LTS.
Debian has Stable. That's it.
- rationably 3 days ago
  
  https://wiki.debian.org/LTS
  
  hairyplanter a day ago
  
  This does not contradict what I said. Every Debian release is (and will be) an LTS release.

nathanaldensr 4 days ago

Weird that the maybe-AI-generated image of a column says "Postgres 13" on it when the article talks about Postgres 17.

codetrotter 4 days ago

Seems perfectly reasonable to me. The article is about people not upgrading from older versions. One could imagine that PostgreSQL 13.0 is the “pillar” of some company, that their whole system relies upon. The article then goes into detail on what they are missing out on by not upgrading PostgreSQL to a more recent major version, and why it might be that so many stay on ancient versions, and also how you can actually perform major version upgrades of PostgreSQL.
- FearNotDaniel 3 days ago
  
  Yeah, I mean it's not hugely imaginative, but it kind of makes sense, if you want to describe pg 13 as "ancient" that you use some kind of Greek/Roman temple as a visual analogy for that. Not particularly well executed - a serif font might have helped to drive the "joke" home - but nobody's trying to win an art contest here.