~comcloudway/ansible-srht

10cca97d2223d14eef644dedf67f201fea27ff23 — Jakob Meier 10 months ago 10a224c
Added additional troubleshooting steps
3 files changed, 38 insertions(+), 23 deletions(-)

M docs/TROUBLESHOOTING.md
M roles/builds.sr.ht/README.md
M roles/builds.sr.ht/tasks/worker.yml
M docs/TROUBLESHOOTING.md => docs/TROUBLESHOOTING.md +34 -20
@@ 1,8 1,5 @@
# Troubleshooting
## lxc troubleshooting
The following errors are likely to occur,
when running sourcehut inside of lxc.
### python missing
## python missing
 It appears like the alpine image is missing python by defaults,
which would lead to ansible crashing.
If this happens to you (or you would like to prevent it in the first place),


@@ 12,24 9,41 @@ attach to the lxc container and install python.
lxc-attach -n srht -- apk add python3
```

### unable to start docker
If your ansible playbook fails with the error message:
> Error connecting: Error while fetching server API version
If you are not using ansible,
connect to the server as you normally would 
and execute the `apk add python3` command.

on one of the docker related tasks, and you are running inside of lxc,
you have to modify a couple of files.

First of all add `--exec-driver=lxc` 
to the `DOCKER_OPTS` in `/etc/conf.d/docker` (of your lxc container).
## Guest wont settle
Example log output:
```
[#9] 2023/11/04 17:37:18 Booting image alpine/edge (default) on port 22108
[#9] 2023/11/04 17:37:18 Waiting for guest to settle
[#9] 2023/11/04 17:39:18 Error: Settle timed out after 3 attempts
qemu-system-x86_64: terminating on signal 15 from pid 1261 (/bin/sh)
```

On the hosts system edit `/var/lib/lxc/<container name>/config`
and add the following lines:
This might be related to modules not being loaded on the host.
For me modprobing `ext4` solved this issue.

``` text
# For docker
lxc.apparmor.profile = unconfined
lxc.cgroup.devices.allow = a
lxc.cap.drop =
### ssh builds@runner or git@git asks for a password
This implies that the playbook was unable reset the password for the given account.
Connect to the container running your sourcehut services,
and execute one of the following commands,
depending on which password you want to set:
```
# change builds password
passwd builds
# change git password
passwd git
```
When asked for a password, hit enter twice to remove password protection.
Sourcehuts git dispatcher will handle keys and authentication for you.

Once you're done, restart the container and rerun the playbook.
### Runner says: This account is not available
This implies that the playbook was unable to change the login shell for the builds user.
Connect to the container running your sourcehut services,
and edit `/etc/passwd` with your favorite text editor.
Find the line that says: 
`builds:x:105:65533:builds:/home/builds:/sbin/nologin`
and replace `/sbin/nologin` with `/bin/sh`:
`builds:x:105:65533:builds:/home/builds:/bin/sh`

M roles/builds.sr.ht/README.md => roles/builds.sr.ht/README.md +2 -2
@@ 59,9 59,9 @@ and your mirrors/repositories should be setup.
Afterwards open `/etc/fstab` using a text editor (i.e nano or vi)
and add the following line:
``` text
host0   /mnt    9p      trans=virtio,version=9p2000.L   0 0
host0   /home    9p      trans=virtio,version=9p2000.L   0 0
```
Close the file, run `mount -a` and navigate into `/mnt`.
Close the file, run `mount -a` and navigate into `/home`.

If you type `ls` you should see that the files from the host system are visible.


M roles/builds.sr.ht/tasks/worker.yml => roles/builds.sr.ht/tasks/worker.yml +2 -1
@@ 65,8 65,9 @@

- name: Make sure the runner user login shell is set correctly
  ansible.builtin.user:
    name: builds
    name: "builds"
    shell: "/bin/sh"  # may not be set to /sbin/nologin
    password: ""

- name: Make sure runner log dir exists
  ansible.builtin.file: