I am using unattended-upgrades across multiple servers. I would like package updates to be rolled out gradually, either randomly or to a subset of test/staging machines first. Is there a way to do that for APT on Ubuntu?
An obvious option is to set some machines to update on Monday and the others to update on Wednesday, but that only gives me only weekly updates…
The goal of course is to avoid a Crowdstrike-like situation on my Ubuntu machines.
edit: For example. An updated openssh-server comes out. One fifth of the machines updates that day, another fifth updates the next day, and the rest updates 3 days later.
No, I’m asking how to have unattended-upgrades do that.
Duder… c’mon: https://wiki.debian.org/UnattendedUpgrades
Is there anything about staggered upgrades and staging environments in there? Because obviously I had read it before posting…
https://wiki.debian.org/UnattendedUpgrades#Modifying_download_and_upgrade_schedules_.28on_systemd.29
Bottom of the page. It’s not about staging environments, but it’s about scheduling the updates in systemd.
I invite you to re-read the second paragraph of my post.
You’re just throwing things I already listed back at me. I mentioned a staging environment, I mentioned a schedule was a (bad) option.
You can literally schedule them by the minute, but okay buddy.
I’ll never not be stumped by people who are looking for answers shitting all over those answers.
Maybe I’m not being clear.
I want to stagger updates, giving time to make sure they work before they hit the whole fleet.
If a new SSH version comes out on Tuesday, I want it installed to 1/3 of the machines on Tuesday, another third on Wednesday, and the rest in Friday. Or similar.
Having machines update on a schedule means I have much less frequent updates and doesn’t even guarantee that they hit the staging environment first (what if they’re released just before the prod update time?)
You could set your staging environment PCs to be checking for updates hourly and installing them daily.
You could set your other PCs to just be downloading the updates daily but only install them on certain days of the week.
That means your staging servers could be constantly updated, but your other servers only download the updates, but wait until a certain day to install them.
I’m not sure you can set the timer based on a specific package being updated without some bash scripting alongside checking for which things are getting updated in your staging servers, and then using that script to update the unattendedupgrades control files on your second and third tier PCs in the fleet to adjust when they’re supposed to install those updates.
I can’t currently find anything on prohibiting specific packages or only installing selected updates from the downloaded updates. Perhaps you could use a mix of systemd downloading the updates and a cronjob for installing them?
Further, Ubuntu/Debian is technically already doing this as well. They already have staggered rollouts in APT.
If you’ve ever updated via command line and seen the phrase “These packages have been kept back” or “these following upgrades have been deferred due to phasing” it’s because they’re purposefully withholding those updates from you, to make sure they roll out safely to everyone. That way, if a handful of users who get a phased rollout have issues, the rollout can be undone before it goes out to everyone.
I found the page about “phased upgrades” (somehow missed it searching for “staggered”, “incremental”, “delayed”, etc). Thanks for the pointer!
Unfortunately it doesn’t seem configurable on my end, and it rolls out in about 54 hours so it can take out most of my machines before I have time to react (my first machine might update ~20h into the phased rollout, the rest will break within 24h). Bummer!
To actually answer your question, you need some kind of job scheduling service that manages the whole operation. Whether that’s SSM or Ansible or something else. With Ansible, you can set a parallel parameter that will say that you only update 3 or so at a time until they are all done. If one of those upgrades fails, then it will abort the process. There’s a parameter to make it die if any host fails, but I don’t recall it right now.
I think I would want a bigger delay, an faulty upgrade might only break something within hours.