Backup and Restore

Description

For disaster recovery and possibly, server replication, we are providing a script that will backup any GOBii instance. This script takes a snapshot (commit) of the Docker image passed as a parameter and pushes that commit point to Dockerhub in the given repository, making the whole process incremental. The TAG that will be used is the current timestamp – of the format YYYY-MM-DD_hh-mm-ss, ex. 2017-05-26_10-49-57. Clients can, however, easily change the timestamp format by modifying that string in the script.

The intention of this script is to be used in cronjobs for incremental backups of GOBii instances. 


Usage

Run this as user 'gadm' or your GOBii admin user account, which should be a sudoer. But do not run the script as sudo. It will prompt for password as needed. Also, make sure you use bash instead of just sh when running the script.


Usage Template
bash backup_Docker_to_hub.sh <name_of_Docker> <Dockerhub_username> <Dockerhubpassword | askme> <Dockerhub_repository_name>
#Set Dockerhubpassw parameter to 'askme' for the script to prompt for password instead -- useful for single runs, but not for cronjobs
Example Usage
bash backup_Docker_to_hub.sh GOBii_db_icrisat cegicrisat dummypassw123 GOBii_db_icrisat_prod


The command above creates a snapshot backup of the DB Docker node of ICRISAT's production server, with a tag of the timestamp when it was ran. In ICRISAT's Docker hub account, we can see:


The frequency of the backups is entirely up to the client. Docker Hub doesn't impose any limit in the size of the repository, and an account can have as many public repositories as you want and 1 private repository for free. Paid subscriptions are available for more than 1 private repository at a very reasonable price. Note that only the database Docker needs to be private as it contains the client's data (i.e. metadata like markers, samples, projects, etc.) while the web and the compute Dockers do not contain any sensitive information.


This backup will not include the GOBii data bundle (/data/GOBii_bundle) which contains your instance configuration, loaded files, scripts, IFL, MDEs, HDF5 genotype data, etc. Typically, this bundle is in your file server (depending on your installation setup, which varies from CG to CG), but you are responsible to do regular backups of that path as well, maybe via scheduled rsyncs or whatever works for your organization. This leads to my suggestion of how to do a GOBii production server's disaster recovery strategy below.


Suggested Strategy

  • Decide on the frequency of your incremental backup, ex. every 12 hours, on non-peak hours.
  • Create a cron job for each of your nodes (db, web, compute) that uses the script, backup_Docker_to_hub.sh, and pass the correct parameters.
    • It is important that they are in (almost) perfect sync. The script for each node can run in parallel.
  • The script does not require that the Dockers be stopped before backing up, so it can be run without any down-time. But there are instances when there are operations you don't want to commit (ex. in the middle of loading huge GBS data), so it still is optimal to do the backups on non-peak hours.
  • Set a cron job for backing up your GOBii bundle at the same time. You can use rsync for incremental backups of plain directories like this. Note that you need to back up the whole GOBii_bundle directory (typically in /data/GOBii_bundle).
    • It will be best if you use the same timestamp in backing up the GOBii_bundle as you used in backing up the Dockers. This way, the restore scripts will only need one tag/timestamp to restore (or replicate) a whole GOBii instance.


The script is Docker agnostic and can be used on any Docker instance. It is also rather simple, and available here: http://GOBiin1.bti.cornell.edu:6083/projects/GM/repos/GOBiideployment/browse/backup_Docker_to_hub.sh