Based on my experiments with getting rclone working (I can’t believe it’s taken me this long to play with rclone…), I wanted an upload tool for files and folders that could be dated against the upload date.
The uploaded paths look like
What this allows us to do is upload a folder like
/genomics_files to R2 periodically, and be able to either go to the latest or previous uploads.
This tool is especially useful for uploading databases that are used for analyses, like BLAST. These are large (10+ GB) files that change over time as researchers add their genomics tags and annotations — and they’re what a lot of genomics tools use to find “hits” — to see whether someone else has tagged and described the unknown DNA sequence our lab just sequenced.
We’ll be using this system to store the latest versions of databases for our fresh analyses, but also store historical versions to make sure we get the same results. It’s nice that R2 doesn’t have egress, which means we can just pull these files down every time we do a run, instead of storing them all on our local machines.