In my last post, I talked about the important of backing up, and how I do it. The upshot is I use a cloud file provider, which automatically synchronises my data, keeps a file history, and allows delete restore. There are many options here – I settled on Sync.com because it is zero-trust out of the box, is reasonable value, enables file and folder sharing, and generally seems good.
In the post before last, I outlined my website setup, which is a Linux box running a collection of Docker containers. Unfortunately Sync.com doesn’t have a Linux client or API yet, so I can’t directly use the same approach. Also, part of the backup is of the MySql database.
There is also a much stronger requirement for incremental backups – you want a backup every day, and the data probably hasn’t changed very much since the previous day’s backup. However, you may also need to go back several days or weeks to a specific backup.
This is very much like source control for code, which in fact I also use as part of the backup strategy.
TL;DR
I use cron to run a script which:
Does mysqldump of the databases to text files,
Uses Restic to make an incremental encrypted backup of these and the filesystem to my NAS and (via rclone) to Google Drive.
All the configuration, static HTML, CSS and so in is stored in a BitBucket.org repository.
Old school
Initially I had some very simple backup scripts, which did the following:
Create a dated backup folder on the NAS.
Dump the database into the folder.
Make a tarball of all the files.
Also mirror the files to a ‘live’ folder on the NAS.
This was ok as far as it went. It did need manual intervention to delete the old folders from time to time, and to copy to a USB stick occasionally. It’s pretty inefficient in terms of runtime and storage. Each backup takes up around 5GB (mainly due to all the photos on photo.eutony.net). It also doesn’t provide an offsite backup, so not really ideal.
Shiny and new
In my general overhaul of everything I decided I needed a new backup approach as well. It had to satisfy the 3-2-1 requirement, be fully automated, but also be elegant and efficient.
In my research I came across Restic, which ticks all the boxes. It is a encrypted, block-based incremental/differential backup system. So I can backup the the entire filesystem every day, but only the changes since the previous backup will be stored. Furthermore, a full history of the filesystem going to back to the first backup is retrievable. Restoring a particular snapshot will provide the entire filesystem at that point of snapshot.
In that regard, it is very much like a Git repository, just minus the branches.
The output from Restic looks like this:
using parent snapshot 6cc86ebd
Files: 0 new, 4 changed, 10136 unmodified
Dirs: 0 new, 11 changed, 723 unmodified
Added to the repository: 20.222 MiB (20.224 MiB stored)
processed 10140 files, 5.500 GiB in 0:46
snapshot 0b1c0bc4 saved
So you can see it’s processing 10,136 files across 734 directories in a 5.5 GB archive, and added 20MB for the 4 changed files. And all in 46 seconds.
This is all good and well for the file-system, but what about the database?
Well, I use mysqldump to write a plain text file of SQL to a folder that is including in the Restic backup. Actually I’ve got 3 databases, so it’s 3 files. The plain text obviously makes the individual files bigger, but it makes it easier for Restic for chunk it up, and only store the deltas, not the whole file.
Backup Storage
So Restic will roll up my backups into a nice snapshotted repository – but where does that live?
Well, in keeping with the 3-2-1 approach, I actually use two repositories. One is hosted on my NAS (Restic plays nicely with ssh, using sftp), and the other is on Google Drive.
“But wait”, I hear you say, “how to you access Google Drive from a Linux command shell – and anyway, didn’t you say you didn’t trust Google not to look at your data?”. Turns out both of these are simple to address, using Rclone to access Google Drive, and Restic’s built in file encryption.
Setting up Restic and Rclone was pretty straightforward, and the docs are good. I’ve done a single test restore, which went without a hitch. And my backup script verifies the integrity of the repository every day, and pushes the log file to my phone via ntfy.
So, in all it’s glory, my backup script, which is run from crontab every night looks like this. You will of course understand that I’ve removed credentials and network information.
#!/bin/bash
resticpw_file=/home/backup/.secret/resticpw.txt
log_file=/tmp/backup.txt
# Dump the MySql databases
mysqldump --opt --create-options --add-drop-table -h 127.0.0.1 \
eutony_net --default-character-set=utf8 >
/home/backup/db/eutony_net.sql
mysqldump --opt --create-options --add-drop-table -h 127.0.0.1 \
gallery3 --default-character-set=utf8 >
/home/backup/db/gallery3.sql
# Output the files to the log file, for validation
echo "**DB**" > $log_file
echo "" >> $log_file
ls -l /home/backup/db >> $log_file
echo "" >> $log_file
# Restic backup to the NAS
echo "**NAS**" >> $log_file
echo "" >> $log_file
restic -r sftp://someone@192.168.xxx.xxx:/backups/pi/restic backup \
--password-file $resticpw_file \
/home \
/var/www \
--exclude ".git" \
--exclude "logs" \
--exclude "wordpress" \
--exclude "!wordpress/wp-content/wp-uploads" \
--exclude "!wordpress/wp-config.php" \
--exclude "/home/backup/source" \
--exclude "/home/backup/.*" >> $log_file 2>&1
echo "-------" >> $log_file
# Restic check of the NAS repo
restic -r sftp://someone@192.168.xxx.xxx:/backups/pi/restic check \
--password-file $resticpw_file \
--read-data-subset=10% \
> /tmp/backup-check.txt 2>&1
tail -n 1 /tmp/backup-check.txt >> $log_file
echo "-------" >> $log_file 2>&1
# Restic backup to the Google using rclone
echo "" >> $log_file
echo "**Google**" >> $log_file
echo "" >> $log_file
restic -r rclone:GoogleDrive:/backups/pi/restic backup \
--password-file $resticpw_file \
/home \
/var/www \
--exclude ".git" \
--exclude "logs" \
--exclude "wordpress" \
--exclude "!wordpress/wp-content/wp-uploads" \
--exclude "!wordpress/wp-config.php" \
--exclude "/home/backup/source" \
--exclude "/home/backup/.*" >> $log_file 2>&1
echo "-------" >> $log_file
# Restic check of the Google drive repo
restic -r rclone:GoogleDrive:/backups/pi/restic check \
--password-file $resticpw_file \
> /tmp/backup-check2.txt 2>&1
tail -n 1 /tmp/backup-check2.txt >> $log_file 2>&1
echo "-------" >> $log_file
# Send a push notification of the backup and log file via ntfy.sh
curl -H "Title: Backup Pi to NAS" \
-H "Tags: pi,computer" \
-T $log_file \
https://ntfy.sh/my-secret-backup-topic > /tmp/ntfy.log 2>&1
I’ve chosen to omit a few files and directories from the restic backup which don’t need to be backed up in this way, which has made the restic command look more complex then it really is.
The files are encrypted with a key stored in ~/.secret/resticpw.txt which need to be stored securely in multiple places, as without it you can access the backup!
Mine key looks a bit like a Bitwarden fingerprint phrase – but you’ll have to forgive me for not going into any more details than this.
Speaking of Bitwarden, watch this space for all things password, coming soon.
The world is facing a severe crisis of environmental degradation and climate change that affects our ability to sustain human civilisation in its present form,
The main cause of the crisis is human activity,
The crisis is inextricably linked to global injustices, inequality and extinction of many species,
The crisis indicates a failure of human beings to follow God’s mandate to care for the world and to seek justice among its peoples,
we, the leaders of St Mark’s Church, Harrogate, want to publicly recognise a Climate and Environmental Emergency, and commit ourselves to:
Examining our lives individually and corporately in relation to this crisis and seeking to live faithfully to God,
Bringing forward by the end of 2023 an action plan to minimise our negative corporate impact on the environment and climate and to help restoration where possible,
Encouraging our members to make relevant lifestyle changes appropriate to their circumstances,
Encouraging action on this emergency in our neighbourhoods, workplaces and other spheres of activity,
Using whatever influence we may have to bring about positive actions by local and national government, corporations and other organisations.
You don’t need to work with technology for long before you realise the importance of having a backup strategy.
The two main use cases are disaster recovery, and mitigation against accidental deletion or edit.
The first is generally more straightforward – you are simply looking to be able to restore all your data in the case of hardware failure, or catastrophic user error. The scenarios are losing or dropping your phone or laptop, hard drive failure, memory stick loss or corruption, cloud provider failure, malware, accidentally deleting an account or formatting a hard drive, and so on. Furthermore, you also need to think about robbery, and flood or fire.
And let’s be clear – with any storage media, but especially hard disks, failure is a when not an if.
The received wisdom on this is to have a 3-2-1 plan. Have 3 copies of your data, 2 on different devices, and 1 offsite. It is suggested that a further ‘offline’ copy is also taken, so that should malware or ransomware hit and all connected copies are affected, there is a copy which you can be sure is secure.
My take is what one of my lecturers told me when I was a undergraduate – If your data doesn’t exist in 3 different locations, it doesn’t exist at all! Locations here meaning virtual, rather than physical, although the data does need to be in 2 physical locations.
TL;DR
I use sync.com (full disclosure – this is a referral link) to store all my documents and data (including e-mail backups).
In this way, I have at least 3 copies (which are kept in sync), spread across multiple devices, including offsite. Sync.com also offer file versioning and deleted file retrieval. Nice.
I do also have a NAS (Network Attached Storage) which has all my photos and videos, but this is also mirrored on Sync.com.
Backing up
My main backup strategy used to be a combination of a NAS drive and USB memory sticks. The NAS has two hard disks setup in a RAID-1 configuration, so the data is mirrored over both disks. If either disk fails, it can be replaced and will automatically re-mirror from the other one. It relies on the likelihood of both disks failing at the same time being low, which it is. The slight hesitation is that I bought both hard disks at the same time, so they are both likely to fail around the same time.
I had a script which mirrored the NAS onto one of my PC hard disks, and then periodically I would back it all up to USB memory sticks which I kept in a fireproof safe. The NAS is also useful in terms of documents which are shared (like utility bills, and photos).
This was fine as far as it went. The NAS took take of 2 of the copies on it’s own, but all the copies of the data were in the same physical location, and it relied on me being bothered to save to the USB sticks regularly, which I wasn’t great at. It also was limited in terms of recovery from accidental deletion.
So instead I now use the cloud for backup storage.
Google Drive, OneDrive, Dropbox and other storage providers have a very robust infrastructure, with multiple geographically distributed copies. I personally wouldn’t rely solely on these, but as most of them sync a copy to your hard drive, even if (say) Microsoft goes go down you haven’t lost it all. Plenty of people do rely on them, and they are whole lot better than no backup!!!
My issue with this is that Microsoft/Google/Dropbox can read your data. For some stuff I’m not too fussed about this, and they are an excellent way of sharing photos or distributing a newsletter. But I don’t really want Dropbox having access to my bank statements, say.
Sync.com
Instead of these I now use Sync.com. They are a zero knowledge cloud data storage provider, which means I am the only one who can access my data. It integrates with Windows and MacOs like OneDrive, so that changes are automatically synced up to the cloud.
Their free account is pretty good – 5GB of storage, which you can extend to 10GB fairly easily by getting rewarded for various actions, like installing it on a phone. If you refer a friend, you also get an extra 1GB each time. They also provide file versioning, so you can restore an older or deleted file. My family members have free accounts, and Sync.com’s excellent sharing facilities allows me to share documents, and lets them to use ‘my’ 2TB of storage with my paid plan.
I opted for a paid plan, which is $96 a year, for 2TB of storage. This is more storage than I will need for some time (my NAS also has 2TB capacity, but I’m only using 500GB). All my local documents on Windows are automatically synced with Sync.com, which satisfies my 3-2-1. The stuff on the NAS still gets mirrored to the Hard Disk, but the active folders also get mirrored to my Sync.com space.
Sync.com isn’t perfect – the desktop client is a bit clunky, there’s no API or Linux client (which is a nuisance). But in terms of value and getting zero-trust encryption it ticks my boxes, plus it’s great for sharing, and really good to get file versioning and delete recovery.
NAS backup scripts
All the copying is managed using Task Scheduler and (e.g.) Robocopy to mirror the NAS into my Sync.com directories. The scripts themselves are a simple batch file, such as this one, which mirrors a shared folder from the NAS onto my local D: drive.
The upshot is, all my documents are stored on my local hard disk, and also on Sync.com.
All my photos and videos are stored on my NAS, but mirrored to my local hard disk, and also uploaded to Sync.com.
That just leaves e-mail.
E-mail backups
The final piece of the jigsaw is e-mail, which is primarily stored on my e-mail provider’s IMAP server, and is partially replicated on my hard disk by my e-mail client.
Rather than assume anything about my e-mail client’s (or provider’s!) retention strategy, I manually sync all my e-mail accounts to a folder, which is in turn synced with Sync.com. I don’t bother with my Microsoft or Google e-mail addresses (I figure they have probably got that covered), but the addresses at eutony.net I do back up.
This is a little more technical as it needs a tool to do the IMAP mirroring – I use imap-backup running in a Docker container (so I don’t need to fight to the death with Windows over installing ruby, and all the scripts!!)
The Dockerfile spins up an ubuntu image with imap-backup installed:
FROM ubuntu:latest AS build
ENV HOME /root
SHELL ["/bin/bash", "-c"]
RUN apt-get update && apt-get -y --no-install-recommends install ruby
RUN gem install imap-backup
RUN adduser --system --uid 1000 --home /home/imapuser --shell /bin/bash imapuser
RUN mkdir /imap-backup && chown imapuser /imap-backup
USER imapuser
ENV HOME /home/imapuser
WORKDIR /home/imapuser
RUN mkdir .imap-backup && chmod og-rwx .imap-backup
COPY --chown=imapuser config.json .imap-backup/config.json
RUN chmod 0600 .imap-backup/config.json
CMD ["imap-backup", "backup"]
The only piece of magic is copying config.json, which has the account configurations, and looks a bit like this, with sensitive information removed.
The docker-compose.yml then mounts a local Windows directory as /imap-backup, so the script can save the data locally. As this folder is under my Sync.com folder, the data gets automatically stored and versioned in the cloud.
Lastly, we just need a Scheduled Task to run docker compose up periodically.
Restore
Of course, backups are only any use if you can restore them.
With a cloud provider based approach (such as Sync.com), the files are just ‘there’. Accessing them is via the web client, or phone all, and restoring them on a new device as simple as installing the desktop client and letting it sync.
Imap-backup backs up in a standard text-based format on the disk, but also supports ‘restoring’ to a new IMAP account.
Logging
Last thing to mention is the important of logging and checking your backups. Scripts go wrong, get de-scheduled, etc. You don’t want to find this out after you’ve lost everything.
My approach is to have every backup script called by a wrapper script than handles logging what’s going on. This wrapper script is invoked by the Task Scheduler.
@echo off
echo [%date% - %time%] Backup start > "%~dp0\log.txt"
CALL "%~dp0\backupnascore-photos-d.bat" >> "%~dp0\log.txt" 2>&1
echo [%date% - %time%] backup finished >> "%~dp0\log.txt"
curl -H "Title: Backup Photos from NAS to D Drive" -H "Tags: win11,computer"-T "%~dp0\log.txt" https://ntfy.sh/my-ultra-secret-backup-topic
The sharp-eyed will notice a curl call to ntfy.sh. This is simply a way to ping the backup result to my phone, so I can scan the logs for errors, and hopefully notice if I haven’t received one for a whole. I actually self-host my own ntfy instance, but I started off using ntfy.sh, and it works really well.
But wait, there’s more…
I don’t only have a Windows box, but also a Linux box which runs all my databases.
As I mentioned last time, the config and code is all in source control, so automatically backed-up. However the database and media files associated with the websites also need backing up, which is what I will cover next time…
Well – it has taken some time (and partially explains the lack of posts), but I think I’ve got my personal websites set up “just so” now.
The core engine is still WordPress, which is running headless to provide all the content to the front-end. I use WordPress admin to manage the site, write posts, update pages, and so on. The interaction is via WPGraphQL, which took a bit of effort to get working, but provides a nice standard API to access the details.
The front end (www.eutony.net) now runs off nextjs, running on a node server. So the majority of the site is rendered at build time, with the semi-dynamic pages (the front page, photos, and archive) using Incremental Static Regeneration (ISR). The truly dynamic pages (infinite scroll on front page, and search results) are still built on demand. It does mean that if I want to change the style or certain aspects of the context I need to rebuild the whole site, but I think it’s worth it. Nextjs is a pretty steep learning curve. I started off trying to host it in Vercel, but unfortunately my hardware wasn’t able to keep up with the API demands the free Vercel puts on it, so it’s self-hosted.
The photos site (photo.eutony.net) is still served by a standard HTTP daemon. There are one or two other sites also running in this way.
The really exciting thing is that all of these (including the database) are running in Docker containers, behind an nginx server, which handles the SSL offloads, and reverse proxying. Docker compose brings up the whole shebang. The best bit is that it’s essentially zero-configuration for the host machine, beyond installing Docker. As it happens, it’s all running on a Raspberry Pi, and the move to Docker was a result of having to reformat the SD card to upgrade the OS. The thought of having to install and configure Apache, WordPress, MySql, LetsEncrypt, was too much.
In practice this means that my entire website is host agnostic – if I want to move it to an cloud provider such as Azure, it’s a simple as spinning up the containers there instead (plus a bit of DNS jiggery-pokery). All the code and configuration (except for secrets) is managed in source control. Docker is so good, in that it helps me keep the host environment clean and light, it means I can run the containers anywhere, and it means my exposed services are sandboxed so if they do get compromised the attackers don’t gain access to my server. It’s also so lightweight I can run the 7 or so containers alongside each other without overloading the very limited hardware.
Finally, the whole thing is sitting behind Cloudflare. I don’t get enough traffic to really need a CDN, but it lifts a bit of the load off the Pi, plus of course means that my IP address doesn’t get exposed, which is a good thing.
In terms of backups, I mainly rely on all the configuration and HTML/CSS/JS being in source control, so it’s just the database and any uploaded files which need backups. There’s a nightly script which takes care of that, but in the next exciting instalment I’ll be sharing my backup strategy!
I periodically record the instructions for silly games that I came across, usually in the setting of a youth group.
This game is called “Empire”. Each player starts off as the “Ruler” of their Empire (which consists of only them at the start), and tries to add other players to their Empire by correctly guessing their secret identities. But if another person else guesses theirs, then they and their entire Empire gets subsumed into that other person’s Empire! The winner is the person who has all the other players in their Empire.
It works best with between 8 and 20 people.
To start, every player will need a pen and a small piece of paper. They chose the name of a famous person (alive or dead, real or fictional) who they will “be” for the purposes of the game, and write it on the piece of paper. All the names then go into a hat (or other suitable container).
The organiser then reads out all the names which have been submitted – e.g. “Lady Gaga”, “Harrison Ford”, “Pooh Bear”, etc., and the game begins.
The first player – say Alice – points to a person – say Bob – and asks them if they are a specific person? For example “Bob – are you Lady Gaga?”.
If Bob isn’t “Lady Gaga”, i.e. Alice guessed incorrectly, then play passes to Bob to make the next guess.
If, on the other hand, Alice guessed correctly (so Bob did write down “Lady Gaga”), then Bob moves over to sit with Alice. Alice’s “Empire” now consists of her and Bob, and she can make another guess. Bob plays no further active role in the game, except to advise Alice on her guesses, and to move with her.
If another player subsequently correctly guesses who Alice is, then both Alice and Bob move over to become part of that person’s Empire, and they can make no further guesses.
The physical movement is important, as it shows who is in which Empire, and how big they are. When you’re doing to 2 or 4 active players, people are usually having at rack their brains for which names haven’t been guessed.
The only other rules are:
The list of names is only read out once at the start of the game – the players have to remember all the names.
The names should be of people that all the players could reasonably be expected to have heard of and be able to remember.
If there are duplicate names, there are 3 options:
Start again, with everyone writing down the names again, but the people who had a duplicate have to choose a different name
Play with the duplicates – so “Harrison Ford” has to be guessed twice.
Treat the duplicates as one person, so as soon as “Harrison Ford” is correctly guessed, both “Harrison Ford”s join the guesser’s Empire
It doesn’t have to be people – it could be films, books, meals, places. Any subject where the players are likely to choose different options from one another.
With younger or less reliable players, they could also have to write their own name on the paper, so the host can resolve disputes.
The host doesn’t usually play, as they have the advantage of seeing the handwriting.
This is a game of memory and psychology – especially if you play more than one round! Subsequent rounds can start with the winner of the previous round.
What did the dad say when his kids asked him for money? “Money doesn’t grow on trees, you know.”
Why was the dad’s belt arrested? For holding up his pants!
Why don’t dads ever have any money? Because they always spend it on their kids!
How do you know when a dad is about to make a joke? When he starts to twinkle.
What do you call a dad who’s always on the phone? A chatterbox!
How does a dad make a coffee run? Very carefully.
What did the dad say when he heard his son was stealing? “I hope he takes after his mother.”
Why did the dad cross the playground? To get to the other slide!
How do you know when a dad is done mowing the lawn? He’s sweating profusely.
What did the dad say when his son asked him how to make a sandwich? “Put it together yourself, I’m not your sandwich maker.”
I think part of what is interesting about this is the way in which the jokes are wrong. There are ghosts or echoes of jokes in many of these, and they read a bit like bad translations.
With nextjs “getInitialProps” will run server-side when the page first loads, but then run client-side on reloads (i.e. if you navigate back to it).
This means my website breaks if you go “back” to a search results page, as it tries to hit the headless WordPress back-end – which it can’t. So it errors out, and you get this lovely message:
Application error: a client-side exception has occurred (see the browser console for more information).
Solution – put the WordPress call behind an api, and then either always hit that api from “getInitialProps”, or work out whether you’re running client side or server side, as below:
I saw a post on Mastodon this morning about the futility of making resolutions for an arbitrary 365 day period.
This didn’t sit quite right with me, so I pondered it for a while and realised what my objection is. I don’t really mind the notion that resolutions are futile (back in 2011 I moved away from the idea and language of resolutions), but I don’t agree that the time period is arbitrary.
Of course, the 1st January is semi-arbitrary – but today we return to the same relative position in the solar system that we were at 365.25 days ago, having travelled an astronomical 9×1011m in the meantime. In July 2022 we were 3×1011m away from where we are now.*
Having completed our annual pilgrimage aboard the good ship Earth, I think that it’s fair enough to stop and reflect on what has happened and changed, to celebrate making such a grand journey in time and space, and to look ahead to what might be this time around.
It made me appreciate again just how governed by “the stars” our timekeeping is. Days obviously correspond to the spinning of the earth. Months broadly correspond with the orbit of the moon. Seasons are entirely due to the orbit of the earth, as is the year. In fact the only artificial constructs are time periods less than a day (hours, minutes) – at least until you get to atomic vibrations and light wavelengths – and the grouping of days into a week, which is arguably a theological construct (and there have been interesting studies on the effects that different lengths of week have on humans).
The 1st January is only semi-arbitrary because it’s likely related to the winter solstice (and possibly the perihelion of it’s orbit), as is Christmas Day. But it also made me wonder whether Australians are more laid back than us Brits because their New Year celebration is heading into the height of summer, rather than the depth of winter? I guess in theory starting the year with days getting longer should be nicer than days getting shorter, but I do slightly shudder at the thought of the cold, dark, and wet months ahead in the UK.
Anyway, my not-resolutions are to follow. The last few years could be summarised as “make it to next year”, but it would be nice to have a slightly higher bar this time around!!
Every blessing for 2023, and I pray that God’s light would illume whatever darkness you are facing.