It depends what you’re optimising for. If you want a single (relatively small) download to be available on your HDD as fast as possible, then your current setup might be better (optimising for lower latency). However, if you want to be maxing out your internet speeds at all time and increase your HDD speeds by making the copy sequential (optimising for throughput), then the setup with the catch drive will be better. Keep in mind that a HDD’s sequential write performance is significantly higher than its random write performance, so copying a large file in one go will be faster than copying a whole bunch of random chunks in a random order (like torrents do). You can check the difference for yourself by doing a disk benchmark and comparing the sequential vs random writes of your drive.
I’ve had good experiences with whisper.cpp (should be in the AUR). I used the large model on my GPU (3060), and it filled 11.5 out of the 12GB of vram, so you might have to settle for a lower tier model. The speed was pretty much real time on my GPU, so it might be quite a bit slower on your CPU, unless the lower tier models are also a lot faster (never tested them due to lack of necessity).
The large model had pretty much perfect accuracy (only 5 or so mistakes in ~40 pages of transcriptions), and that was with Dutch audio recorded on a smartphone. If it can handle my pretty horrible conditions, your audio should (hopefully) be no problem to transcribe.