~linuxgoose/bocpress

ref: 8412fe3a69bf4fb3362f4f288c8aa68f7b704b76 bocpress/docs/src/runbook.md -rw-r--r-- 2.1 KiB
8412fe3aJordan Robinson fix wording 4 months ago

#Runbook

So, mataroa is down. What do we do?

Firstly, panic. Run around in circles with your hands up in despair. It's important to do this, don't think this is a joke! Ok, once that's done:

#1. Check Caddy

Caddy is the first point of contact inside the server from the outside world.

First ssh into server:

ssh root@mataroa.blog

Caddy runs as a systemd service. Check status with:

systemctl status caddy

Exit with q. If the service is not running and is errored restart with:

systemctl restart caddy

If restart does not work, check logs:

journalctl -u caddy -r

-r is for reverse. Use -f to follow logs real time:

journalctl -u caddy -f

To search within all logs do slash and then the keyword itself, eg: /keyword-here, then hit enter.

The config for Caddy is:

cat /etc/caddy/Caddyfile

One entry is to serve anything with *.mataroa.blog host, and the second is for anything not in that domain, which is exclusively all the blogs custom domains.

The systemd config for Caddy is:

#Step 1: Ansible

We use ansible to provision a Debian 12 Linux server.

(1a) First, set up configuration files:

cd ansible/
# Make a copy of the example file
cp .envrc.example .envrc

# Edit parameters as required
vim .envrc

# Load variables into environment
source .envrc

(1b) Then, provision:

ansible-playbook playbook.yaml -v

#Step 2: Wildcard certificates

We use Automatic DNS API integration with DNSimple:

Note: acme.sh's default SSL provider is ZeroSSL which does not accept email with plus-subaddressing. It will not error gracefully, just fail with a cryptic message (tested with acmesh v3.0.7).

curl https://get.acme.sh | sh -s email=person@example.com
# Note: Installation inserts a cronjob for auto-renewal

# Setup DNSimple API
echo 'export DNSimple_OAUTH_TOKEN="token-here"' >>
cat /etc/systemd/system/multi-user.target.wants/caddy.service

#2. Check gunicorn

After caddy receives the request, it forwards it to gunicorn. Gunicorn is what runs the mataroa Django instances, so it's named mataroa. It also runs as a systemd service.

To see status:

systemctl status mataroa

To restart:

systemctl restart mataroa

To see logs:

journalctl -u mataroa -r

and to follow them:

journalctl -u mataroa -f

The systemd config for mataroa/gunicorn is:

cat /etc/systemd/system/multi-user.target.wants/mataroa.service

Note that the env variables for production live inside the systemd service file.

#3. How to hotfix code

Here's where the code lives and how to access it:

sudo -i -u deploy
cd /var/www/mataroa/
source .envrc  # load env variables for manual runs
source .venv/bin/activate  # activate venv
python manage.py

If you make a change in the source code files (inside /var/www/mataroa) you need to restart the service for the changes to take effect:

systemctl restart mataroa
/root/.acme.sh/acme.sh.env # Issue cert acme.sh --issue --dns dns_dnsimple -d mataroa.blog -d *.mataroa.blog # We "install" (copy) the cert because we should not use the cert from acme.sh's internal store acme.sh --install-cert -d mataroa.blog -d *.mataroa.blog --key-file /etc/caddy/mataroa-blog-key.pem --fullchain-file /etc/caddy/mataroa-blog-cert.pem --reloadcmd "chown caddy:www-data /etc/caddy/mataroa-blog-{cert,key}.pem && systemctl restart caddy"