The Many Realms Of Zulip

2024-12-17 21:34:55 PST

Bart Massey 2024

Thought I'd do a quick post-mortem (damn near, anyway) on my big adventure of the last few hours. It involved reconfiguring a Zulip server I run, and was supposed to be a quick thing. But Zulip is never quick.

Zulip has an interesting configuration option for allowing multiple Zulip chat servers on a single host server. They call this "realms" for some reason. By default you only get the one default realm on your server, so that's what I got when I very quickly set mine up a couple of years ago.

I now wanted to reconfigure to allow multiple realms: https://site1.zulip.example.com (for example), and https://site2.zulip.example.com instead of just https://zulip.example.com on my cloud server.

Thus the fun began…

DNS

So the first thing was to get the domain names set up. I run the DNS for example.com. I am serving it with bind9 or bind or named — all different names for the same piece of software in use on my home Linux server, depending on context. It turns out that systemctl restart named and systemctl restart bind9 are just aliases of each other. Which is weird.

I've spent a lot of time in /etc/bind configuring this thing, so I wasn't anticipating any big deal. I slapped site1.zulip.example.com as an A record in the zone table and… nope.

A half-hour of flailing later I called a friend who is both generous with his time and a genius. He too was confused. The thing we both thought should work, and the internets thought should work, didn't work.

Skipping a bunch more flailing, the desired result was achieved by adding a new zone for zulip.example.com in the zone file for example.com (as zone master, backed up to my friend as zone, er, alternate). With the NS and CNAME records filled in just right, it all just worked.

Upgrading Zulip

Before I tried to do anything with Zulip, I figured I should upgrade first, because it was time anyway and I'd be working from a stable base. Sadly, Zulip is not packaged for Debian as far as I can tell, so I had to download a big tarball and have some script from the existing Zulip installation run the upgrade.

The Zulip install script refused, because "unsupported Debian version". Much digging around later, it turns out my cloud server provider, who had graciously installed Debian for me, had done something that altered both /etc/debian_version and /etc/os-release to say I was running trixie/sid. Some careful hand-editing of these files got me back to where the Zulip script was willing to admit that I had an OS they supported and install the software.

There was one other quirk: the installer wanted libvips, but Debian had only libvips42. Huh. So I broke down the upgrade tarball, hand-edited the dependency, and then rebuilt the tarball and gave it to the installer again. Success.

Move The Existing Zulip

I then wanted to move the existing Zulip from zulip.example.com to site1.zulip.example.com. I used the Zulip backup script (wouldn't work earlier because of the version thing) to back the existing Zulip up, then just used another Zulip script to move the thing. Just worked, which surprised me.

Deal With Nginx and Certificates

Of course, everything has to be TLS now. So I ran another Zulip script which ran certbot to get a new TLS certificate for site1.zulip.example.com. (Given the amount of Zulip instances I ever expect to run, getting a wildcard cert seemed like excessive effort.)

I then confronted a couple of sad realities: nothing was working, and nginx configuration was the problem. I have been using Apache since it came out, and I am just not that comfortable with nginx. However, it was on this server because reasons and seemed hard to replace, so I buckled down and started to patch up the config.

One issue was another service running on my cloud box, "Punchy". Punchy had its nginx config installed in /etc/nginx/conf.d and really wanted to be in charge of the TLS for everybody. I finally dpkg-diverted it to sites-available where it should have been in the first place.

The key finding of this phase was that every server section needed to have a server_name set. Anything that didn't just kind of took over everything else. Finally sorted that all out.

One Last Zulip Config

At this point, I had my Zulip desktop client talking successfully to site.zulip.example.com. Hooray.

Unfortunately, browser access not so much. The browser took a login, but then just hung spinning, with a message that said "if this doesn't come back in a few seconds try reloading the page". Needless to say, a reload solved nothing.

Much adventures later, I got out the browser developer tools, which reported that Zulip was still trying (and failing) to talk to zulip.example.com. I then discovered /etc/zulip/config.py, which had zulip.example.com set as primary, and no entry in the alternate hostname for site1.zulip.example.com. I added the latter, and then altered the nginx configuration to allow the former.

Conclusions and Future Work

Hooray. I'm back to where I started. Except now I'm running Zulip the way I wanted to, and also now I've fixed the Punchy config and also have figured out how to do a static site for my cloud server using nginx. Way too many hours, but a moderate success.

In digging through Zulip stuff I noticed that it may support Github and Google for auth now. I need to look into this: it's way more convenient.

Now if Zulip would fix alerts on mobile it might become actually usable for people. Hooray.