Utilizing Nvidia GPUs in a MicroK8s cluster

Not that I have any serious processing to do, but a few days ago I decided to check how it could be done if/when the need is there. It can’t hurt to have the ability I thought. Since I already have a Kubernetes cluster, it would be silly to write some custom code for distributing the jobs, and the GPUs are more suitable for the imagined heavy lifting compared to the CPUs. A Kubernetes operator sounds like the way to go for the actual interaction with the hardware driver, and sure, Google lets us know that Nvidia has a GPU operator hosted on GitHub. It was a too obvious idea for me to be the first one to think along those lines…

Since I am running Ubuntu’s Kubernetes distribution, MicroK8s, I also had a look to see what they offer, and they provide an addon which attempts to bundle the operator and pre-configure it to fit MicroK8s out-of-the-box. Sounds like the way to go, a simple “microk8s enable gpu” is suggested. Unfortunately that did not work for me despite a number of attempts with various parameters. Maybe it works for others but in my situation, where I already have the driver installed on the nodes that have GPUs, and want to use that host driver, I had no luck despite specifying the latest driver version and forced host driver. So, back to square one and I decided to try my luck with using Nvidias GPU operator “directly”. The MicroK8s add-on installs to the namespace “gpu-operator-resources” by default, so a simple “microk8s disable gpu” and deletion of all resources in that namespace (“microk8s delete namespace gpu-operator-resources”, to avoid conflicts, put us back to a reasonable starting position.

In the Nvidia documentation there is a section about the Containerd settings to use with MicroK8s so the paths are matching what MicroK8s expect. And by specifying “driver.enabled=false” in order to avoid the nvidia driver as a container and using the pre-installed host driver, we have a winner:

microk8s helm install gpu-operator -n gpu-operator --create-namespace \
nvidia/gpu-operator --set driver.enabled=false \
--set toolkit.env[0].name=CONTAINERD_CONFIG \
--set toolkit.env[0].value=/var/snap/microk8s/current/args/containerd-template.toml \
--set toolkit.env[1].name=CONTAINERD_SOCKET \
--set toolkit.env[1].value=/var/snap/microk8s/common/run/containerd.sock \
--set toolkit.env[2].name=CONTAINERD_RUNTIME_CLASS \
--set toolkit.env[2].value=nvidia \
--set toolkit.env[3].name=CONTAINERD_SET_AS_DEFAULT \
--set-string toolkit.env[3].value=true

At least with that, the resources in the gpu-operator namespace are healthy, and it passes the validation test (“microk8s kubectl logs -n gpu-operator-resources -lapp=nvidia-operator-validator -c nvidia-operator-validator”) and can run that CUDA sample application “cuda-vector-add”.

Now I just have to figure out what to do with it… Re-encoding movies, forecasting the local weather based on measurements on the balcony and open weather data, beating the gambling firms or the hedge funds. The opportunities are endless for the naïve developer. 🙂

Posted in hårdvara, linux | Tagged , , , | Leave a comment

Building and hosting multi architectural OCI images for your local Kubernetes cluster

I have been running my MicroK8s cluster a few years now with a few applications without much issues. It is hosting my web exposed photo galleries and runs the distributed backup solution for example.

Until now it has been sufficient to use public images from Ducker Hub and whatever tweaks or additional packages have been applied without much effort post pod initialization. Now I have containirized a solution I developed for exposing an arbitrary YouTube channel via RSS and noticed that installing the right packages (for example FFmpeg which brings a bunch of mandatory dependencies with it which one can not opt out of) took some time and I did not want to add this overhead to each cron job execution (yes, I run these updates and conversions as Kubernetes CronJobs).

My cluster is running on 6 Raspberry Pi 4s stacked in a tower, but the master node is running on a Ryzer server (and I have some other nodes on Ryzen servers which are labelled according so that they could pick up heavier load if neeed) so that means that the images I use should be available for both arm64 and amd64 architecture.

I was choosing between Docker and Podman for building but since I had Docker installed on the machine I was building on, I went with Docker and more specifically docker buildx. To host the images locally I use the built-in MicroK8s registry addon which can be enabled easily (and allocating 20 GB) with: microk8s enable registry:size=20G

My image requirements are fairly simple, Debian Stable has what I need except Python 3 and some media packages (and yes, procps is for that nice process signaling utility pkill):

FROM debian:stable
RUN <<EOF
apt-get update && apt-get install -y ffmpeg mediainfo wget ca-certificates python3 procps
EOF

Buidling with Docker is typically as easy as “docker build .” but due to the cross-platform image needs I used the following to build for amd64/arm64, tag the image, export as OCI image and upload to my local registry on localhost (I did the build on the same server as where the MicroK8s master node runs and use the default registry port 32000). In order to be able to reach the local registry I had to create a custom builder mybuilder with relaxed security constraints (“docker buildx create --use --name mybuilder --driver-opt network=host --buildkitd-flags '--allow-insecure-entitlement“) and then build using that builder:

docker buildx build -f podcast-stable-image -t localhost:32000/mydebianstable:registry --platform linux/amd64,linux/arm64 . --push --builder mybuilder

In the CronJob description it is then possible to refer to the image with:
image: localhost:32000/mydebianstable:registry

The loading is then a matter of seconds instead of minutes.

In order to not upset Google it makes sense to use a sidecar container with vpn connection so you use different IP addresses when accessing their servers.

Posted in datorer, linux, webbprojekt | Tagged , , , , | Leave a comment

Upgrading the NVMe storage

One of my servers has been running low on disk space and after a row of half measures I finally did something about it. A few months ago I added a second NVMe drive to another server and it was easy peasy since it had a second M.2 slot for NVMe drives. The server I upgraded yesterday only had one M.2 slot for drives, the other slot was meant for WiFi devices and has a different contact.

Yesterday I got a 4 TB NVMe drive which I bought cheaply because it is using the older but still quick enough for me PCIE 3.0 standard. I was not keen on reinstalling the OS and simply wanted to have the new 4 TB drive replacing the current 1 TB drive. The server runs a few VMs and I also wanted to minimize the downtime. Some quick research showed that Clonezilla still (initially released 2007) gets the job done so I installed it on an UEFI bootable USB drive. The new 4 TB drive I put temporarily in one of those NVMe-USB 3.0 cases and plugged them both in the server’s USB 3.0 slots and rebooted. Clonezilla is intuitive for someone used to TUIs and after about 20 minutes the 1 TB drive was cloned to the 4 TB drive including partition table and all.

Since the partition drive is cloned the root filesystem partition had to be increased to utilize all the new free space but that was easily done in GParted. After replacing the 1 TB drive with the 4 TB drive in the M.2 slot I was prepared to have to tell the BIOS where to boot from but it booted straight away from the new drive and it just worked exactly as before, but with 25 % of the disk space used instead of nearly 100%. Success at the first shot, not every time but nice with a positive surprise for once.

I have seen that there are devices for this specific purpose but paying for such a device seems unnecessary if you only do some cloning occasionally. For professionals with a walk-in “I quickly clone your NVMe drive shop” it probably makes sense though…

What to do with the leftover 1 TB drive? For now I leave it in the NVMe-USB 3.0 case and have it as a 1 TB USB 3.0 stick (which at the time of writing goes for about 100 bucks).

Posted in datorer, hårdvara, linux | Tagged | Leave a comment

The Chipolo that got a second chance

To replace or to recharge, that is the question when it comes to battery powered devices. I have switched most of my small devices to rechargable batteries (the only exception is devices where Felix is suspected to kill/drop the battery powered toy) and in general I strongly dislike devices with built-in irreplacable batteries, where you are supposed to throw away/recycle perfectly working electronics just because the battery has discharged or can’t be recharged anymore.

One such example is a Chipolo Card (1st gen) which has an odd battery (CP113130, which can be bought from China in minimum quantities of 100 batteries…) It is a 3 V battery so I replaced it with a USB to 3 V converter (a few euros on Amazon or AliExpress). That works fine but since USB battery packs typically disconnects the load if it is regarded as “too low” (typically somewhere below 100 mA) one needs to either add extra load, get a pack without where such “intelligence” (check Voltaic) can be disabled, or put an adapter between the power source and the consumer that keeps the battery pack online with a quick “ping load”.

I went for the latter option since the Voltaic packs were hard to find with delivery to my place. For those who wants to solder themselves, this is a comprehensive guide with an option to buy the parts you need. Maybe I do that later if I need one more. There are many DIY projects at Tindie and I got one of them. After putting that “pulse generator” between the battery pack and the USB 5.5 V to 3 V converter, the Chipolo Card works as expected. After a few days the battery pack is down from 100 % to 75 %. The USB battery pack can still be charged and used as a normal battery pack since the pulse generator is not blocking the other USB ports.

Not the most beautiful installation I have done in my life but still better than throwing away a perfectly working Chipolo. 🙂

First test without pulse generator. The battery pack turned off after some time.
Battery pack with pulse generator (and electric tape…)
Posted in hårdvara, hus och hem | Tagged , | Leave a comment

Raspberry Pi 4 with OS on big (>2 TB) drives

I recently switched to Ubuntu 22.04 on my Raspberry Pi 4 and faced an issue with trying to use the full space on my RAID cabinet (Icy Box IB-RD3640SU3E2). The parition table type MBR is no good for such cases and that is what one gets after writing the installer on the USB drive with rpi-imager. What to do?

Fortunately there is a convenient tool called mbr2gpt (don’t bother with gparted, it failed me at least) which does the conversion in-place without data loss (at least the two times I have used it…).

The process for a fresh Ubuntu 22.04 installation on an external USB connected drive (for example a RAID cabinet as outlined above).

1. Use rpi-imager to write Ubuntu 22.04 preinstalled image to the usb drive and to a SD card
2. Boot with the USB drive plugged in and go through the Ubuntu installation guide
3. Boot with the SD card plugged in and without USB drive and go through the Ubuntu installation guide.
4. Boot with the SD card but without USB drive, then plug in the USB drive when Ubuntu has started. Unmount the USB drive if it is mounted.
5. Use the mbr2gpt utility to convert mbr to gpt and expand the root partition (sudo mbr2gpt /dev/sda). Choose to expand the root filesystem and not to boot from SD card.
6. Reboot without the SD card.

If you have to also restore a previous backup (for example a simple tar archive done with tar cvf /backup.tar --exclude=/backup.tar --exclude=/dev --exclude=/mnt --exclude=/proc --exclude=/sys --exclude=/tmp --exclude=/media --exclude=/lost+found \ /), like in my case, this would be the additional steps:

7. Start with the SD card and without USB drive.
8. Mount the USB filesystem on /media/new and create /media/backup
9. Mount the filesystem with backup file: mount -t nfs serve:/media/backupdir /media/backup
10. Make a backup of /media/new/etc/fstab and /media/new/boot/firmware/cmdline.txt
11. Copy the backup to USB drive’s filesystem: tar xvf /media/backupdir/backup-file.tar -C /media/new
12. If you refer to PARTUUIDs in fstab or cmdline.txt, restore to match the partition UUID from the backup
13. Reboot without the SD card and hopefully the system boots fine (like it did for me…)

Posted in datorer, hårdvara, linux, webbservern | Tagged , , , , , , | Leave a comment

Scripting inspiration

Yesterday, too late in the evening, I stumbled upon a need, to save an album available for streaming for offline use, to be able to stream it conveniently via my Sonos system (yes there are streaming services to stream directly from as well but that is not as future proof let’s say). If you live in a legislation where it is allowed to make a private copy, this is fine from a regulatory viewpoint.

I asked google what solutions/products other people had come up with. Nothing obvious, a bunch of crap where scammers want you to pay for some shitty Windows applications. No thanks.

Next thought. How difficult could that be in a bash script since you are able to combine a bunch of well tested software components. This type of scripting is more like lego than software development.

I think I had something working in less that 15 minutes since I had already used parec to record sound output produced by PulseAudio. My thought is that there are probably a bunch of problems and cases where people would have a lot of use of some basic scripting abilities, if that is PowerShell in Windows or Bash in Ubuntu does not matter, in order to get things done without buying/installing software.

If you are already onboard, good for you, otherwise a primer in scripting might be a well invested evening. First this one, and then the classic Advanced Bash-Scripting Guide.

!/bin/bash
#Dependencies: script that moves sound output to a named sink (moveSinks.sh), mp3splt
NOW=$(date '+%Y-%m-%d')
mkdir ~/Music/$NOW
echo "Start playing the album"
sleep 2
DEFAULT_OUTPUT_NAME=$(pacmd list-sinks | grep -A1 "* index" | grep -oP "<\K[^ >]+")
pactl load-module module-combine-sink sink_name=record-n-play slaves=$DEFAULT_OUTPUT_NAME sink_properties=device.description="Record-and-Play"
${HOME}/scripts/moveSinks.sh record-n-play
/usr/bin/parec -d record-n-play.monitor | /usr/bin/lame -r -V0 - "${HOME}/Music/${NOW}/${NOW}.mp3" &
while true; do
  sleep 15
  number_sinks=$(pacmd list-sink-inputs | grep available. | cut -c1-1)
  echo "Found this no. of pulse sinks: $number_sinks"
  if [[ $number_sinks -le 1 ]]; then
    # Stop recording kill -9 $!
    break
  fi
  sleep 45
done
#Split into separate tracks
cd ${HOME}/Music/${NOW}/
mp3splt -s ${NOW}.mp3

Posted in datorer, linux, programmering, webbprojekt | Tagged , , | Leave a comment

Streaming and recording IP TV – follow up

I noticed that the recording jobs sometimes got interrrupted, since the streams were not completely reliable. If ffmpeg does not get any data delivered it gives up after some time and I do not see any parameter to configure that timeout. (Please let me know if you know about it…).

I came up with a fairly crude and simple, but working, let’s call it pragmatic solution, which is generic enough to share in a post like this.

The script that registers the recording at jobs does:

...
at -m ${FORMATTED_STARTTIME} <<!
bash /home/xxx/scripts/recordIptvRobust.sh ${SEC_LENGTH} ${CHANNEL_NO} ${START_TIME}
!
...

and the recordIptvRobust.sh bash (5.0 and later since it uses the convenient epochseconds variable) script does:

...
START_EPOCH=$EPOCHSECONDS
STOP_EPOCH=$(( "${START_EPOCH} + ${SEC_LENGTH}" ))
while (("${EPOCHSECONDS} < ${STOP_EPOCH}")); do
/usr/bin/ffmpeg -i "http://stream-url-here/${CHANNEL_NO}" -y -err_detect ignore_err -c:v copy -c:a copy -t $SEC_LENGTH /home/xxx/public_html/record/record_${STARTTIME//:}${CHANNEL_NO}${EPOCHSECONDS}.mkv
# Give the stream some time to restore order
sleep 10
done
# Merge segments with mkvmerge
/usr/bin/mkvmerge -o /home/xxx/public_html/record/record_${STARTTIME//:}${CHANNEL_NO}.mkv $(ls -1 /home/xxx/public_html/record/record${STARTTIME//:}${CHANNEL_NO}.mkv | paste -sd" " | sed 's/ / + /g')
...

Posted in datorer, tv, webbprojekt | Tagged , | Leave a comment

Streaming and recording IP TV

Sport is better on a big screen and in my case that means a project screen in the wintergarden. I have a Chromecast and a laptop connected to the projector and previously I have been using the laptop with Kodi for movies and various web based streaming services. The Chromecast was basically only used for guests that wanted to stream something from their mobile phones via the guest wifi.

If you have an IPTV provider that provides an unencrypted and available stream, and there are many such, there is nobody stopping us from developing a solution with focus on user friendliness with a fairly low investment of time and hardware which is better than what the various out-of-the-box solutions can provide. I will point out the key points from my solutions below so you can stitch together something for yourself quickly if you find yourself with similar needs at some point.

I use and like Home Assistant, and have created a view there where the sport channels can be watched, also the last recorded event. Nothing fancy but the actions are available with a single finger tap which was my first priority. I did not want to fiddle with a computer or start streaming from the phone when it is time to watch something. There are obviously some steps involved so let’s take a look at what we need.

For a starter we need a Home Assistant installation and I run mine on a RPI4 which is sufficient for my needs (and it is doing a fair bit nowadays). To go into details on HA is beyond the scope of this blog post. (In theory we could maybe get away with letting the same RPI4 be the server that acts as streaming proxy and recorder as well if we avoid the transcoding but that is less robust and I have other servers around at home so there is no need to stretch the boundaries of the RPI4’s capacity.) I have another server, the stream server, running the scripts for exposing the stream to the Chromecast and to do the stream recording and hosting the small web site to do the recording.

In Home Assistant I have a Home Assistant script that invokes a shell_command which executes a script on the “stream server” via non-interactive ssh. The script on the stream server uses VLC to point the Chromecast to the stream corresponding to the channel:

/usr/bin/cvlc "http://stream-url-here" --sout="#chromecast{ip=X.X.X.X}" --demux-filter=demux_chromecast --sout="#transcode{venc=x264{preset=ultrafast},vcodec=h264,threads=1}:chromecast{ip=X.X.X.X,conversion-quality=0}" --loop

Why transcoding with VLC we might ask ourselves, would it not work to just hand over the stream url to the Chromecast with something like go-chromecast or pyChromecast? I did tests with all three and VLC is clearly more fault tolerant due to the transcoding it is doing on the fly, meanwhile the other two relies on the stream being compatible and relies on the Chromecast to handle it. VLC is able to handle a much broader range of formats so your chances are must better that way. The stream from my provider did not work when Chromecast tries to interpret it “natively”. Your success may vary… The “–loop” part is essential since it will make vlc re-try in case the connection is interrupted.

That was the “live streaming part”, now over to the recording. I use the same mechanism (with VLC) when playing already recorded streams and go-chromecast for controling the stream (play/pause/rewind/forward/seek). As an example, let’s say I want to record an NHL game at 2 am in my time zone, we would need a way to specify the channel, time and duration. I made a rudimentary web ui for this purpose. The only interesting feature there is probably the possibility to look up matching events based on a team. Since I anyway source all sports events from a sports TV site and have the future events represented by ical files on a reachable filesystem, I can simply rgrep among all the ical files and find info about an upcoming event matching for example “Colorado” and prepopulate the channel and time fields. The biggest hurdle was to transform datetime stamps in the formats used in ical/the html component datetime-local and something that “at” understands. Fortunately this is all pretty simple with date which understands the datetime from the html component and can give us the format that at prefers, date -d "${STARTTIME}" +"%H:%M %Y-%m-%d"

When the form is submitted, a record job is placed on the stream server with “at”, which is a convenient tool to schedule ad-hoc jobs which should run at a specific time. The job is simply to let ffmpeg record the stream url matching the channel and save it to the disk, followed by a command to let ffmpeg switch container format (from mkv to mp4, the Chromecast was doing better with mp4) but avoiding re-encoding video and audio streams (still with h264) just to avoid any irregularities caused when saving the stream which most likely will prevent proper navigating in the file (relative and absolute positioning). Such a “container switch” is fast since the streams are left as is. The last thing the scheduled job does is to update the link to the “last recording” to point to the new one.

ffmpeg -i "stream-url-here" -y -err_detect ignore_err -c:v copy -c:a copy -t $SEC_LENGTH /home/username/public_html/record/record_${STARTTIME//:}${CHANNELNO}.mkv
ffmpeg -i /home/username/public_html/record/record${STARTTIME//:}${CHANNELNO}.mkv -c:a copy -c:v copy /home/username/public_html/record/record${STARTTIME//:}${CHANNELNO}.mp4
ln -sf /home/username/public_html/record/record${STARTTIME//:}${CHANNELNO}.mp4 /home/username/public_html/record/latest.mp4
rm /home/username/public_html/record/record${STARTTIME//:}_${CHANNELNO}.mkv

I should point out one thing which caused my a bit of head ache. The scheduling of the at job needs to be done differently when scripting compared to in an interactive shell. I first tried to pipe the job to at but that did not work because the standard input in the script is different. To avoid that, simply use a “here document” (from the Advanced Bash scripting guide: “A here document is a special-purpose code block. It uses a form of I/O redirection to feed a command list to an interactive program or a command, such as ftpcat, or the ex text editor.”)

command <<EOF-MARKER
input
more input
EOF-MARKER
nextCommand

For me that meant:

at -m ${FORMATTED_STARTTIME} <<!
the commands to run in at job
multiple commands can be used, one per line
!

A typical scenario when it comes to actually watching recordings, is to simply play the latest recording, for example an NHL game that took place during the night. One tap in Home Assistant is enough since a symbolic link points to the latest recording, and we can use the same setup as outlined above, but point Chromecast to the url for the recording on the stream server (it saved the recording to a directory in the user’s public_html exposed via Apache2 (a2enmod userdir). I did some experiments with dbus-send to VLC, and meanwhile that works, using go-chromecast for the navigation turned out more convinient. With that utility you can simply do things like “/usr/bin/go-chromecast seek 60 -a X.X.X.X” to fast forward a minute (a commercial break…) or using seek-to to go to an absolute position.

You might have noticed I have been referring to the Chromecast by IP address. Usually the tools make it possible to call it by its friendly name but I figure the name is more likely to change (for example when resetting the device) than the IP address due to the static IP address allocation based on the MAC address (this would only need a config change when router changes which is less frequent).

Posted in sport, tv, webbprojekt, webbservern | Tagged , , , | Leave a comment

Have you also forgot (or never knew) the IP address of a Linksys SRW2024 switch?!

I bought a 24 port managed switch, it is a cool thing to have in your private network, since they are basically given away for free nowadays when serious people move from 1 gigabit to 10 gigabit. I don’t foresee my network going 10G in the near future so an enterprise grade 1G switch almost for free seems like a good deal.

I bought one on Swiss Tutti (“Blocket” for Swedish people, “Avito” for Russians) for 20 bucks and it arrived a few days later (I am still waiting to get deceived on Tutti or Ricardo even though there are scammers on these platforms too). Since I did not get to know the correct IP address of the switch (the seller told me the wrong one but I do not count this as deceived, he tried to help) I could not connect to the web management interface and used the managed switch as a dumb switch for a while. I even bought a serial to usb cable until simply doing the reasonable thing…

The reasonable thing, at least afaik, is to connect an ethernet cable between the switch and the a computer with a NIC, start Wireshark or whatever similar tool you have at hand (tcpdump for you cli aficionados), and take a look. When restarting the switch one of the first packets clearly discloses the IP address of the device (I forgot to make a screenshot but it is obvious when you inspect), and then you can manually set your computer to have an IP address in the same subnetwork, get to the web ui and change the IP address of the switch. If you actually have one of those ancient Linksys router like myself, you better take advantage of the compatibility mode in IE since they apparently put some trainee on coding the web ui and it does not load in a modern browser…

Voila, hopefully this saved some people waiting for serial to usb cables from Guangdong via aliexpress et al.

Posted in datorer, elektronik, hårdvara | Tagged , | Leave a comment

Kubernetes powered backup solution over VPN

After ironing out the last bugs with my home grown containerized, distributed, remote backup solution, I can’t say I would recommend it for the average user but if you are comfortable with some hacking, need a backup solution and have a bunch of computers idling at your disposal, this might be for you…

Let’s start with the actual problem, like reasonable people would do. For obvious reasons I do want to backup up files that can’t be easily recreated. For the same obvious reasons I want them stored offsite, the backups should not be burned or stolen together with the original data.

I have done a few iterations of such backup solutions, all utilizing bash and tar in slightly different ways. When hard drives and internet connections got reasonably affordable and quick I added offsite backup over something SSH tunneled (rsync/scp). The location I am currently backing up to has the ftps server only available via vpn so that is part of my limitations in the solution below (otherwise I would say ftps/sftp/scp would have been sufficient for my use case).

The solution I have used lately, running a simple bash backup script (full monthly backup and daily increments which are tared and then encrypted with 7zip) in a Network Namespace with an openvpn tunnel to the offsite location for ftps transfer in the tunnel, has been working more or less fine but had one drawback – lack of parallelization – and running it on multiple bare metal servers is tedious to set up and maintain.

The 7zip encryption is quite demanding and it would be great to scale out in order to take advantage of the available computing capacity in the LAN. Kubernetes to the rescue…

I have a Kubernetes cluster running on two fairly powerful (as of 2021…) Ryzen servers (12 cores/32 GB + 8 cores/32 GB, with one of them running the master node) plus 6 Raspberry Pi 4B 4 GB in my LAN. (The Ryzen servers are running other “nice” processes so should give up available capacity when needed but at the moment I have actually configured the backup jobs to only run on the RPIs tagged with the “rpi” label to not bother the Ryzen servers.)

With version 1.21 Kubernetes included the workload resource Cronjob which basically does what you can imagine (if you have some basic *nix experience). That is quite handy for a backup task since we want the container to run on schedule and self destroy when finished.

Since I want to transfer my encrypted archive to an offsite location in openvpn (without having the host’s networking being affected by this vpn connection) I have a container establishing the vpn connection and one container doing the actual backup task. Since the cronjob is created in a pod where the containers share the networking etc. the backup job is able to transfer to the offsite location.

What about the initial problem, the lack of parallelization? I did not implement some sophisticated queue solution where some workers create the archives and put on a queue meanwhile other workers listens to the queue and encrypts and yet other workers does the actual transfer. The problem itself is quite simple and I want the solution to be simple enough to actually be maintained and to keep it running everyday for years to come.

My simple solution; one cron job for each bigger chunk (the source control repo, photos, family member’s non-cloud documents, mysql databases, etc.) which are scheduled and run independently on the cluster in parallel. I start them during the night (when the internet connections on both ends are not used much anyway) and the compression and encryption tasks don’t finish at the same time due to different file sizes of the archives so they spread out the openvpn/network usage . The transfer tasks share the same limited capacity (about 30 mbit/s to the offsite location in the openvpn tunnel) but the openvpn server is configured to allow multiple concurrent connections from the same user so it is not an issue.

After this introduction, let’s go through the actual implementation including actual configuration and scripts in order for this blog post to actually be useful for someone who wants to implement something similar.

To start with, I run Ubuntu 20.04 LTS (EOL April 2030 so still many years left…) on both the Ryzen servers and the RPIs. The RPI’s are booting and running on reasonably fast USB 3 flash drives and mounted in one of those RPI cluster cases with fans that you can buy cheaply from Amazon or AliExpress. A 7” monitor, power supply for all nodes and gigabit switch is all attached to the case to form one “cluster unit” with only power and 1 network cable as “physical interface”. (When running Ubuntu 20.04 on RPI4, do consider the advice at https://jamesachambers.com/raspberry-pi-4-ubuntu-20-04-usb-mass-storage-boot-guide/.)

I am running Ubuntu’s Kubernetes distribution, microk8s 1.22.4. There are a lot of fancy add ons but according to my experience it is easy to get it into a state where one has to start over if one adds various add ons. After a few attempts I now keep it as slimmed down as possible, no dashboard for example, and only have the add on “ha-cluster” enabled.

Setting it up is basically as easy as running “microk8s add-node” on the master node, and running the corresponding “microk8s join” command on the joining nodes. After that procedure you might admire your long list with “kubectl get no -o wide –show-labels” for example.

Now, over to the “meat” of the solution. The yaml files… I would recommend storing your declarations of the desired states in your source repo so that you can restore your solution on a new cluster with one simple command if needed.

My structure looks like this (I omit the multiple batchjobs and only show two in the file listing below):

-rw-rw-r-- 1 jonas jonas 2745 nov 25 11:02 backup-config.yaml
rw-rw-r-- 1 jonas jonas 3719 nov 29 14:33 batchjob-backup-dokument.yaml
-rw-rw-r-- 1 jonas jonas 3719 nov 29 14:33 batchjob-backup-mysql.yaml


-rw-rw-r-- 1 jonas jonas 5408 nov 24 15:06 client.ovpn
-rw-rw-r-- 1 jonas jonas 242 aug 23 01:30 route-config.yaml

The “backup-config.yaml” (I have tried to indicate the places to update) which contains the script doing the full or incremental backup (depending on the date) including encryption and transfer:

kind: ConfigMap
metadata:
  name: backup-script
apiVersion: v1
data:
  backup.sh: |-
    #!/bin/bash
    DIR_NAME=$(echo $DIRECTORY_TO_BACKUP | tr "/" "-")
    DIR_NAME_FORMATTED=${DIR_NAME::-1}
    BACKUPNAME="rpicluster${DIR_NAME_FORMATTED}"
    BACKUPDIR=/your/path/to/where/you/store/your/archives
    TIMEDIR=/your/path/to/where/you/store/your/time/stamp/files
    TAR="/bin/tar"
    ARCHIVEFILE=""
    echo "DIRECTORY_TO_BACKUP=$DIRECTORY_TO_BACKUP"
    echo "BACKUPNAME=$BACKUPNAME"
    echo "TIMEDIR=$TIMEDIR"
    echo "ARCHIVEFILE=$ARCHIVEFILE"
    export LANG="en_US.UTF-8"
    PATH=/usr/local/bin:/usr/bin:/bin
    DOW=date +%a # Day of the week e.g. Mon
    DOM=date +%d # Date of the Month e.g. 27
    DM=date +%d%b # Date and Month e.g. 27Sep
    MONTH=$(date -d "$D" '+%m') # Number of month
    NOW=$(date '+%Y-%m-%d')    
# First day in month (exception for photos in order to reduce the file sizes)
    if [[ $DOM = "01" && $DIR_NAME_FORMATTED != "Pictures" ]]; then
      ARCHIVEFILE="$BACKUPNAME-01.tar"
      echo "Full backup, no exclude list"
      NEWER=""
      echo $NOW > $TIMEDIR/$BACKUPNAME-full-date
      echo "Creating tar archive at $NOW for $DIRECTORY_TO_BACKUP"
      /usr/bin/nice $TAR $NEWER -c --exclude='/.opera' --exclude='/.google ' -f $BACKUPDIR/$ARCHIVEFILE $DIRECTORY_TO_BACKUP 
    else
      ARCHIVEFILE="$BACKUPNAME-$DOW.tar"
      echo "Make incremental backup - overwrite last weeks"
      NEWER="--newer $(date '+%Y-%m-01')"
      if [ ! -f $TIMEDIR/$BACKUPNAME-full-date ]; then
        echo "$(date '+%Y-%m-01')" > $TIMEDIR/$BACKUPNAME-full-date
      else
         NEWER="--newer cat $TIMEDIR/$BACKUPNAME-full-date"
      fi
      echo "Creating tar archive at $NOW for $DIRECTORIES later than $NEWER"
      /usr/bin/nice $TAR $NEWER -c --exclude='/.opera' --exclude='/.google ' -f $BACKUPDIR/$ARCHIVEFILE $DIRECTORY_TO_BACKUP 
    fi

echo "Encrypt with 7zip…"
/usr/bin/nice /usr/bin/7z a -t7z -m0=lzma2 -mx=0 -mfb=64 -md=32m -ms=on -mh e=on -mmt -p'put-your-secret-phrase-here' $BACKUPDIR/$ARCHIVEFILE.7z $BACKUPDIR/$ARCHIVE FILE
echo "Remove the unencrypted tar archive"
/bin/rm -f $BACKUPDIR/$ARCHIVEFILE
echo "Transfer with lftp"
FILESIZE=$(stat -c%s "$BACKUPDIR/$ARCHIVEFILE.7z")
echo "date -u: About to transfer $BACKUPDIR/$ARCHIVEFILE.7z ($FILESIZE bytes)" >> $BACKUPDIR/$ARCHIVEFILE.7z.scriptlog
lftp -c "open -e \"set ssl:verify-certificate false;set ssl:check-hostname no;set log:file/xfer $BACKUPDIR/$ARCHIVEFILE.7z.log;set net:timeout 60;set net:max-retries 10;\" -u user,password ftp://address-of-your-ftp-server-via-vpn; put -O your-remote-path-here $BACKUPDIR/$ARCHIVEFILE.7z"
echo "date -u: Finished transfer $BACKUPDIR/$ARCHIVEFILE.7z " >> $BACKUPDIR/$ARCHIVEFILE.7z.scriptlog

Alright, with that basic backup script in place which will be re-used by all cronjobs, let’s take a look at one specific batchjob, batchjob-backup-dokument.yaml, which does the backup of the documents directory (I kept my paths in order to show how the volumes are referred):

apiVersion: batch/v1
kind: CronJob
metadata:
  name: backup-dokument
spec:
  schedule: "30 0 * * *"
  concurrencyPolicy: Forbid
  successfulJobsHistoryLimit: 5
  failedJobsHistoryLimit: 5
  startingDeadlineSeconds: 3600
  jobTemplate:
    spec:
       template:
         spec:
shareProcessNamespace: true
         restartPolicy: OnFailure volumes: - name: scripts configMap: name: backup-script - name: backuptargetdir nfs: server: qnap path: /USBDisk3 - name: jonas nfs: server: qnap path: /jonas - name: vpn-config secret: secretName: vpn-config items: - key: client.ovpn path: client.ovpn - name: vpn-auth secret: secretName: vpn-auth items: - key: auth.txt path: auth.txt - name: route-script configMap: name: route-script items: - key: route-override.sh path: route-override.sh - name: tmp emptyDir: {} initContainers: - name: vpn-route-init image: busybox:1.33 command: ['/bin/sh', '-c', 'cp /vpn/route-override.sh /tmp/route/route-override.sh; chown root:root /tmp/route/route-override.sh; chmod o+x /tmp/route/route-override.sh;'] volumeMounts: - name: tmp mountPath: /tmp/route - name: route-script mountPath: /vpn/route-override.sh subPath: route-override.sh containers: - name: vpn image: dperson/openvpn-client command: ["/bin/sh","-c"] args: ["openvpn --config 'vpn/client.ovpn' --auth-user-pass 'vpn/auth.txt' --script-security 3 --route-up /tmp/route/route-override.sh;"] stdin: true tty: true securityContext: privileged: true capabilities: add: - NET_ADMIN env: - name: TZ value: "Switzerland" volumeMounts: - name: vpn-config mountPath: /vpn/client.ovpn subPath: client.ovpn - name: vpn-auth mountPath: /vpn/auth.txt subPath: auth.txt - name: tmp mountPath: /tmp/route - name: backup-dokument image: debian:stable-slim securityContext: privileged: true env: - name: SCRIPT value: backup.sh - name: DIRECTORY_TO_BACKUP value: /home/jonas/dokument/ volumeMounts: - mountPath: /opt/scripts/ name: scripts - mountPath: /home/jonas name: jonas - mountPath: /media/backup name: backuptargetdir command: - /bin/bash - -c - | apt-get update; apt-get install -y lftp p7zip-full procps bash /opt/scripts/$SCRIPT pkill -f -SIGINT openvpn true stdin: true tty: true dnsConfig: nameservers: - 8.8.8.8 - 8.8.4.4 nodeSelector: rpi: "true"

As you might have seen in the cronjob above, it is creating the vpn tunnel as a sidecar container (“vpn”) which gets killed after the backup script is done. The “pkill” step is essential for kubernetes to know that the cronjob has finished. Otherwise it would be left unfinished and next nights job would not start (and SIGINT instead of KILL signal is important since the container will be restarted otherwise). Let’s now take a look at the last piece, the vpn tunnel. (The lack of container communication possibilities is hopefully something that gets addresses in an upcoming, not too distant, release. At least there are ongoing discussions for a few years on that topic.)

The vpn container is simply referring to the ovpn config (if it works for you standalone, it will work in this container) and the vpn credentials. Both are stored as credentials, so put your ovpn client config in a file called client.ovpn and create the secret:

kubectl create secret generic vpn-config --from-file=client.ovpn

Same thing with the credentials (I assume now that you will use username and password), create auth.txt with username and password on separate lines and create the secret:

kubectl create secret generic vpn-auth --from-file=auth.txt

That should be it. To test the job without waiting for 00:30 in the case above, kick it off as an ad-hoc job:

kubectl create job --from=cronjob/backup-dokument name-of-manual-dokument-job

You see which pod got created:

kubectl get po -o wide|grep name-of-manual-dokument-job

This pod was called name-of-manual-dokument-job–1-zpn56 and the container name was backup-dokument so the live log could therefore be checked with:

kubectl logs name-of-manual-dokument-job--1-zpn56 backup-dokument --follow

Alright, that wraps it up. Hope it was useful for something. If not for backups, maybe for other use cases where you need to run something in an openvpn tunnel.

Posted in Uncategorized | Tagged , , | 1 Comment