The server needed a PHP update. WordPress told me so with a severe-sounding notification adorned with red coloration, a security warning, boldface type, and a link explaining how to change the PHP version. I sighed. Security issues never end, and I have a recurring reminder in my todo list to patch the Virtual Private Server (VPS) boxes I shepherd.
But this PHP issue…hmm. This felt like a bigger deal, and many sites I support lean heavily into WordPress. Rather than wait for the next regular patching session, I decided to get on it. I did a process test on one server, a lower profile machine that wouldn’t hurt too much if things went awry. The goal was to move from PHP 7.2.insecure to PHP 7.4.secure. How hard could it be?
Most of the search engine hits for “upgrade PHP on WordPress” told me to go into CPanel or a similar tool my hosting provider might offer to abstract what’s going on with the server itself. That’s not what I was looking for, because I manage my own hosts. I needed to know how to reconfigure the host itself. The OS packages to install. The conf files to tweak. The processes to restart. This was not obvious, as the magical CLI incantations required to complete the task varied depending on which Linux distro and HTTP server I was running on a given box.
This test host (more of a production guinea pig) was running Ubuntu 18.04 LTS with Apache 2. That’s a popular combination, but even so, it took reading several blogs and StackExchange threads to end up with this script to get the job done.
sudo add-apt-repository ppa:ondrej/php sudo apt-get update sudo apt install -y php7.4 sudo apt install -y php7.4-mysql php7.4-dom php7.4-simplexml php7.4-ssh2 php7.4-xml php7.4-xmlreader php7.4-curl php7.4-exif php7.4-ftp php7.4-gd php7.4-iconv php7.4-imagick php7.4-json php7.4-mbstring php7.4-posix php7.4-sockets php7.4-tokenizer sudo apt install -y php7.4-mysqli php7.4-pdo php7.4-sqlite3 php7.4-ctype php7.4-fileinfo php7.4-zip php7.4-exif sudo a2dismod php7.2 sudo a2enmod php7.4 sudo systemctl restart apache2
The script added a repo with the latest PHP packages, updated the installed packages (hey, let’s patch while we’re here), installed core PHP7.4 & several supporting modules, disabled PHP7.2 and enabled PHP7.4 in Apache2, and then reloaded Apache2 to make the change effective. After that, I verified in WordPress Admin > Tools > Site Health > Info > Server that the PHP version was the shiny new 7.4.x.
That worked.
Good job, everyone. On to the next server!
Without thinking about it too hard, I tried this same script on the next server needing a PHP update. The script failed. Couldn’t add the PHP repo. Couldn’t even do a plain old system update. Wait…wut? A simple “sudo apt-get update” should have worked. What was going on? This new box was running Ubuntu 19.04 not 18.04–that’s what. In the Ubuntu world, only some releases are supported long term, designated by LTS. When I’d built this server in–you guessed it–2019, I didn’t understand how Ubuntu LTS releases work correctly.
I thought any Ubuntu release of YY.04 was an LTS release, and thus went with the latest-at-the-time Ubuntu 19.04. But Ubuntu 19.04 was not an LTS distro. In May 2021, Ubuntu 19.04 was dead to the community. It’s not maintained. The servers itemized in /etc/apt/sources.list were no longer valid for 19.04, which is why the box couldn’t even do a simple system update. (Can you tell I hadn’t been patching this one nearly as often as I should have been? Sigh.)
I assumed I’d just migrate the box from 19.04 to 19.10, and then to 20.04 LTS. But that didn’t work out, either. If I’d fiddled and forced things with enough determination, I might have been able to make that work. Maybe. Even if I’d figured that out, I wasn’t confident I’d end up with a robust Ubuntu 20.04, though. Instead, I opted to rebuild the server from scratch.
Rebuilding it better.
Rebuilding the server was the right choice here. Admittedly, I’m from the old school where I tend to think of servers as these precious pets lovingly raised by hand and pored over with love and care. Kill it and make another one? How rude! Only, it really isn’t rude.
I was wasting my time, spinning my wheels looking for a way to upgrade this retired OS. Wasting time when I had better things to do was the rude bit. Besides, it’s not as if standing up a WordPress server is hard these days. My hosting provider enabled it at the press of a button, and so after letting go of the idea I could save my pet, that’s what I did.
- I knocked the DNS TTL down on the A and AAAA records for the old server from 15 minutes to 5.
- In just a few minutes, I stood up a new server with WordPress pre-installed, courtesy of my VPS provider’s automation.
- I patched the new server, and added PHP7.4.
- I finished configuring the basic WordPress installation.
- I installed UpdraftPlus, my WordPress backup tool of choice onto the new server.
- I sent a fresh UpdraftPlus WP backup from the old server to the new server using one of their well-documented processes.
- After testing that WordPress looked right on the new server after the backup restoration, I cut over the DNS A and AAAA records to point the new server.
- I installed a proper SSL cert on the server using LetsEncrypt via Certbot.
- I tested to my satisfaction that the new server was golden, and then powered down the old server’s VPS instance.
- I did some more testing. I fussed with UpdraftPlus backup schedules and retention, and validated the WordPress cron system was working.
- The next morning when all was still working fine, I destroyed the old VPS so that I wouldn’t have to pay for it anymore.
- I cranked the DNS TTLs up to 1 hour, which is still conservative while allowing the global DNS caching system to reduce recursive query load. (If we all did this where we could, the DNS world would be a better place.)
Job done. I went about my day working on other stuff.
It was DNS.
I woke up the next morning to a DM that the new server was not responding. Not responding? It had been fine. I had trusted it. (It’s not my fault!) I fired up a browser, and got a very prompt message from Firefox that the server was not responding. I SSH’ed into the new server. Up. All services were running. I ran “curl localhost” and got the home page back. I tried “sudo systemctl status” and saw no obvious problems. I parsed through /var/log/syslog. No smoke detected. I jumped to another VPS and tried to curl the new server remotely but within the same data center…yet nothing came back. Grr.
I gave away the ending in the section header. I discovered after a bunch more testing that it was, in fact, DNS. The server’s domain was hosted by a well-known third party DNS registrar who offers its own name servers as a default option. That solution is good enough in many cases, and that’s what I was using. When I revisited the registrar’s UI to review the DNS records, the A and AAAA records had reverted back to the old server’s IP addresses! Even more strange, the new TTL of 1 hour I’d set had remained. Such dark wizardry sounds unlikely, and it is.
What happened here? Browser caching on my workstation populating the old records back when I did the final TTL change? The domain registrar’s presumably highly distributed name server system experiencing a database synchronization failure? Something I else I did in the DNS UI after testing that I just missed? Registrar support didn’t know. I asked, hoping for an explanation. I will never know for sure, especially if this never happens again. But no matter how it happened, it was DNS.
This little adventure in IT ops reminded me, once again, to take nothing for granted. Had I updated the DNS and tested thoroughly from multiple platforms and even shut down the old server to know beyond any possibility otherwise that the new server was up and available via the global DNS? Yes to all of those. So was it safe to assume that DNS was correct the following morning when I received the report that the server wasn’t responding? No, it wasn’t.
I did what so many engineers do. I made a bad assumption based on what happened previously, assuming that what was true yesterday would still be true today. I began troubleshooting the HTTP server, before checking the first principle–that the global DNS was pointing to the right host. If I’d started there, I would have reduced my MTTR by about 30 minutes.
Another way to validate first principles would be with automated testing that verifies A and AAAA records are pointing where they should, and alerting when they change. My ops needs are simple and infrastructure churn low. Therefore, I haven’t invested much in automated testing tools like this. I never have the time, you see. Maybe I need to make the time.
Share your own “it was DNS” stories with me on Twitter @ecbanks or in the Packet Pushers Slack channel. Maybe we’ll collect a bunch of these war stories and make a podcast out of it.