The Many Realms Of Zulip
2024-12-17 21:34:55 PST
Bart Massey 2024
Thought I'd do a quick post-mortem (damn near, anyway) on my big adventure of the last few hours. It involved reconfiguring a Zulip server I run, and was supposed to be a quick thing. But Zulip is never quick.
Zulip has an interesting configuration option for allowing multiple Zulip chat servers on a single host server. They call this "realms" for some reason. By default you only get the one default realm on your server, so that's what I got when I very quickly set mine up a couple of years ago.
I now wanted to reconfigure to allow multiple realms:
https://site1.zulip.example.com
(for example), and
https://site2.zulip.example.com
instead of just
https://zulip.example.com
on my cloud server.
Thus the fun began…
DNS
So the first thing was to get the domain names set up. I run
the DNS for example.com
. I am serving it with bind9
or
bind
or named
— all different names for the same piece
of software in use on my home Linux server, depending on
context. It turns out that systemctl restart named
and
systemctl restart bind9
are just aliases of each
other. Which is weird.
I've spent a lot of time in /etc/bind
configuring this
thing, so I wasn't anticipating any big deal. I slapped
site1.zulip.example.com
as an A record in the zone table
and… nope.
A half-hour of flailing later I called a friend who is both generous with his time and a genius. He too was confused. The thing we both thought should work, and the internets thought should work, didn't work.
Skipping a bunch more flailing, the desired result was
achieved by adding a new zone for zulip.example.com
in the
zone file for example.com
(as zone master, backed up to my
friend as zone, er, alternate). With the NS and CNAME
records filled in just right, it all just worked.
Upgrading Zulip
Before I tried to do anything with Zulip, I figured I should upgrade first, because it was time anyway and I'd be working from a stable base. Sadly, Zulip is not packaged for Debian as far as I can tell, so I had to download a big tarball and have some script from the existing Zulip installation run the upgrade.
The Zulip install script refused, because "unsupported
Debian version". Much digging around later, it turns out my
cloud server provider, who had graciously installed Debian
for me, had done something that altered both
/etc/debian_version
and /etc/os-release
to say I was
running trixie/sid
. Some careful hand-editing of these
files got me back to where the Zulip script was willing to
admit that I had an OS they supported and install the
software.
There was one other quirk: the installer wanted libvips
,
but Debian had only libvips42
. Huh. So I broke down the
upgrade tarball, hand-edited the dependency, and then
rebuilt the tarball and gave it to the installer
again. Success.
Move The Existing Zulip
I then wanted to move the existing Zulip from
zulip.example.com
to site1.zulip.example.com
. I used the
Zulip backup script (wouldn't work earlier because of the
version thing) to back the existing Zulip up, then just used
another Zulip script to move the thing. Just worked, which
surprised me.
Deal With Nginx and Certificates
Of course, everything has to be TLS now. So I ran
another Zulip script which ran certbot
to get a new TLS
certificate for site1.zulip.example.com
. (Given the amount
of Zulip instances I ever expect to run, getting a wildcard
cert seemed like excessive effort.)
I then confronted a couple of sad realities: nothing was
working, and nginx
configuration was the problem. I have
been using Apache since it came out, and I am just not that
comfortable with nginx
. However, it was on this server because
reasons and seemed hard to replace, so I buckled down and
started to patch up the config.
One issue was another service running on my cloud box,
"Punchy". Punchy had its nginx
config installed in
/etc/nginx/conf.d
and really wanted to be in charge of the
TLS for everybody. I finally dpkg-divert
ed it to
sites-available
where it should have been in the first
place.
The key finding of this phase was that every server
section needed to have a server_name
set. Anything that
didn't just kind of took over everything else. Finally
sorted that all out.
One Last Zulip Config
At this point, I had my Zulip desktop client talking
successfully to site.zulip.example.com
. Hooray.
Unfortunately, browser access not so much. The browser took a login, but then just hung spinning, with a message that said "if this doesn't come back in a few seconds try reloading the page". Needless to say, a reload solved nothing.
Much adventures later, I got out the browser developer
tools, which reported that Zulip was still trying (and
failing) to talk to zulip.example.com
. I then discovered
/etc/zulip/config.py
, which had zulip.example.com
set as
primary, and no entry in the alternate hostname for
site1.zulip.example.com
. I added the latter, and then
altered the nginx
configuration to allow the former.
Conclusions and Future Work
Hooray. I'm back to where I started. Except now I'm running
Zulip the way I wanted to, and also now I've fixed the
Punchy config and also have figured out how to do a static
site for my cloud server using nginx
. Way too many hours,
but a moderate success.
In digging through Zulip stuff I noticed that it may support Github and Google for auth now. I need to look into this: it's way more convenient.
Now if Zulip would fix alerts on mobile it might become actually usable for people. Hooray.