The Many Realms Of Zulip
2024-12-17 21:34:55 PST
Bart Massey 2024
Thought I'd do a quick post-mortem (damn near, anyway) on my big adventure of the last few hours. It involved reconfiguring a Zulip server I run, and was supposed to be a quick thing. But Zulip is never quick.
Zulip has an interesting configuration option for allowing multiple Zulip chat servers on a single host server. They call this "realms" for some reason. By default you only get the one default realm on your server, so that's what I got when I very quickly set mine up a couple of years ago.
I now wanted to reconfigure to allow multiple realms:
https://site1.zulip.example.com
(for example), and
https://site2.zulip.example.com
instead of just
https://zulip.example.com
on my cloud server.
Thus the fun began…
DNS
So the first thing was to get the domain names set up. I run the DNS
for example.com
. I am serving it with bind9
or
bind
or named
— all different names for the
same piece of software in use on my home Linux server, depending on
context. It turns out that systemctl restart named
and
systemctl restart bind9
are just aliases of each other.
Which is weird.
I've spent a lot of time in /etc/bind
configuring this
thing, so I wasn't anticipating any big deal. I slapped
site1.zulip.example.com
as an A record in the zone table
and… nope.
A half-hour of flailing later I called a friend who is both generous with his time and a genius. He too was confused. The thing we both thought should work, and the internets thought should work, didn't work.
Skipping a bunch more flailing, the desired result was achieved by
adding a new zone for zulip.example.com
in the zone file
for example.com
(as zone master, backed up to my friend as
zone, er, alternate). With the NS and CNAME records filled in just
right, it all just worked.
Upgrading Zulip
Before I tried to do anything with Zulip, I figured I should upgrade first, because it was time anyway and I'd be working from a stable base. Sadly, Zulip is not packaged for Debian as far as I can tell, so I had to download a big tarball and have some script from the existing Zulip installation run the upgrade.
The Zulip install script refused, because "unsupported Debian
version". Much digging around later, it turns out my cloud server
provider, who had graciously installed Debian for me, had done something
that altered both /etc/debian_version
and
/etc/os-release
to say I was running
trixie/sid
. Some careful hand-editing of these files got me
back to where the Zulip script was willing to admit that I had an OS
they supported and install the software.
There was one other quirk: the installer wanted libvips
,
but Debian had only libvips42
. Huh. So I broke down the
upgrade tarball, hand-edited the dependency, and then rebuilt the
tarball and gave it to the installer again. Success.
Move The Existing Zulip
I then wanted to move the existing Zulip from
zulip.example.com
to site1.zulip.example.com
.
I used the Zulip backup script (wouldn't work earlier because of the
version thing) to back the existing Zulip up, then just used another
Zulip script to move the thing. Just worked, which surprised me.
Deal With Nginx and Certificates
Of course, everything has to be TLS now. So I ran another
Zulip script which ran certbot
to get a new TLS certificate
for site1.zulip.example.com
. (Given the amount of Zulip
instances I ever expect to run, getting a wildcard cert seemed like
excessive effort.)
I then confronted a couple of sad realities: nothing was working, and
nginx
configuration was the problem. I have been using
Apache since it came out, and I am just not that comfortable with
nginx
. However, it was on this server because reasons and
seemed hard to replace, so I buckled down and started to patch up the
config.
One issue was another service running on my cloud box, "Punchy".
Punchy had its nginx
config installed in
/etc/nginx/conf.d
and really wanted to be in charge of the
TLS for everybody. I finally dpkg-divert
ed it to
sites-available
where it should have been in the first
place.
The key finding of this phase was that every
server
section needed to have a server_name
set. Anything that didn't just kind of took over everything else.
Finally sorted that all out.
One Last Zulip Config
At this point, I had my Zulip desktop client talking successfully to
site.zulip.example.com
. Hooray.
Unfortunately, browser access not so much. The browser took a login, but then just hung spinning, with a message that said "if this doesn't come back in a few seconds try reloading the page". Needless to say, a reload solved nothing.
Much adventures later, I got out the browser developer
tools, which reported that Zulip was still trying (and failing) to talk
to zulip.example.com
. I then discovered
/etc/zulip/config.py
, which had
zulip.example.com
set as primary, and no entry in the
alternate hostname for site1.zulip.example.com
. I added the
latter, and then altered the nginx
configuration to allow
the former.
Conclusions and Future Work
Hooray. I'm back to where I started. Except now I'm running Zulip the
way I wanted to, and also now I've fixed the Punchy config and also have
figured out how to do a static site for my cloud server using
nginx
. Way too many hours, but a moderate success.
In digging through Zulip stuff I noticed that it may support Github and Google for auth now. I need to look into this: it's way more convenient.
Now if Zulip would fix alerts on mobile it might become actually usable for people. Hooray.