Version 2.1

You can find the release notes for every releases of GOBii in this page: System Requirements and Release Notes#menu-link-content

 

Versioning

When using this document, make sure that you are deploying the correct GOBii version number. The official build string for GOBii version 2.0 is below (a parameter string you need to run the shell scripts):

release-2.1

Operating System, bash & Docker Versions

The following are the versions used when developing and testing within GOBii

Operating Systems:

  • Ubuntu 16.04.5 LTS

  • CentOS Linux release 7.2.1511 (Core)

Bash Version:

  • Ubuntu: GNU bash, version 4.3.48(1)-release (x86_64-pc-linux-gnu)

  • CentOS: GNU bash, version 4.2.46(1)-release (x86_64-redhat-linux-gnu)

Docker Version:

  • Ubuntu: Docker version 18.06.1-ce, build e68fc7a

  • CentOS: Docker version 18.03.1-ce, build 9ee9f40

git Version

  • Ubuntu: git version 2.7.4

  • CentOS: git version 1.8.3.1

GDM Deployment Versions

This shows the Docker versions used for deployment of this release:

  • GDM: release-2.1

  • KDC: server_1.6.1-plugin_0.14.3-build_110

  • ownCloud: base

  • Portainer: latest

  • sherpa: latest

For any questions or clarifications, please contact Kevin Palis or Roy Petrie



Introduction

In this section the Definition of Terms, Background and a brief overview of GOBii is described

Definition of Terms

  • Nodes = GOBii Nodes

    • The term "nodes" here will always refer to the GOBii nodes, which are ‘docker containers’ that can be deployed to different servers or virtual environments. Server nodes, on the other hand, will be explicitly called "server node".

Background

GOBii is made up of multiple modules and categorized according to functions.  A system diagram that shows these categories (by Docker container), the data flow, and the modules is available here.

Depending on your server topology, the instructions on this page may require some tweaking. For each sections with significant differences on steps, depending on server topology, a "Note Box" like the one below will be written.


GOBii's deployment architecture is flexible and node-based. There are three main nodes: computation, database, and web. These nodes are now pre-baked into Docker images and can be deployed in their own server, VM, or in any combination of servers and virtual environments.

To give you an idea, here's an example topology and node-distribution:

Server 1: Server Head: GOBII Test (all nodes) Server Node1: GOBII Prod Database Node Server Node2: GOBII Prod Web Node Server Node3: GOBII Prod Compute Node


You can put GOBii nodes of the same GOBii instance into one server, but we advise against mixing nodes of different GOBii instances into one server. Aside from competing for resources, there are potential conflict points that nodes from different instances may run into.

 



Initial Installation Prerequisites

 

1.The official repository for the deployment scripts is here. Make sure you clone or download the scripts from there. The branch you should get is release/<version> (ex. release/2.1). You can also get the master branch if you are deploying the latest, but because our clients can have varying versions on different servers, all release branches are kept.

2.Finalize your topology and write it down. Because if you are deploying all 3 GOBii nodes to just one server, you run a different script as opposed to when you deploy GOBii into one server per node or any other variations (you run 3 scripts).

3.The servers should have the docker engine version 17 and up installed. Make sure the servers have access to the dockerhub site.

-Ubuntu: https://docs.docker.com/engine/installation/linux/docker-ce/ubuntu/#upgrade-docker-ce-1

-CentOS: https://docs.docker.com/install/linux/docker-ce/centos/

-A mount point or a shared drive that all the nodes can access — this will be a volume mounted to all the 3 docker containers.

4.The user that will run the scripts needs to be a sudoer and under the ‘gobii’ and ‘docker groups’. So preferably the user ‘gadm'. But the username is arbitrary, it just needs to be consistent. You may find 'gadm sudoer’ used in the rest of this document, just note that the name is flexible.



sudo usermod -aG docker gadm



  1. (Optional) A directory where the data of postgres will reside. The default will be Ubuntu's postgres directory in the DB docker (ex. /var/lib/pgsql/data) which will be linked to Docker's default volume directory (ex. /usr/local/docker/volumes/postgreslibubuntu)

For a test GOBii instance, you can use the vanilla version of the dockers:









Non-Destructive Deployment [NDD]

This section is for showing the non-destructive deployment architecture.



This architecture needs to be implemented prior to any deployment post version 2.0



NDD Diagram

This is a diagram that represents the current architecture of the directories and the symlink





In this architecture the /data/gobii_bundle directory will be destroyed during a new deployment and replaced with the latest version. Due to the persistent data being symlinks these directories are saved in the /storage/persistent_data directory.











Backups

This section is for existing instances that already have data in storage. This will show the process used to back up the existing data.


Release 2.1+

During release 2.1 the system has been developed for non-destructive deployments using symlinks to access data. Though it is still very important to perform data backups the restore process described later in this document has been deprecated.

 

  1. Back up the data files from the existing GOBii instance. You can do so by using the backup script we provide. You can find usage instructions here: GOBII Add-on Scripts#onScripts-IncrementalDataBundleBackups. Ideally, you already have this setup to do incremental backups (as a cron job), so that prior to the upgrade you just need to run the script in incremental mode to make sure you capture all changes, then disable access to the system to make sure nobody loads new data while the upgrade is in progress. If that's not the case, you can run the script on full backup mode – just note that this may take several hours depending on the size of your data.

  2. Back up the database (postgres) from the existing GOBii instance. You have two options:

    1. Run the script we provide (GOBII Add-on Scripts#onScripts-PostgresqlRotatingIncrementalBackup) - this also is ideally set up as a cron job.

    2. Manually back up the database

      1. Go into the database node and run pg_dumpall

        $> docker exec -ti <gobii_db_node> bash $> su postgres $> pg_dumpall > /data/all_databases.bak



    3. The data from postgres actually persists as long as you don't delete the Docker volumes. However, we still recommend you back up the database to make sure we have redundancy.



  3. OPTIONAL: If you have KDCompute Docker container running, back up its files by copying the directory /data/kdcompute_file_storage to a directory of your choice. This contains previous output and logs of QC jobs.







Deployment

This section will detail the scripts, parameters and process to deploy GOBii

Deployment Scripts and Parameters

Copy the deployment scripts and files from the cloned repository (prerequisite #1) to the shared drive (prerequisite #4).

We update the param files from time to time (i.e., new features being added), so please don't just copy-paste the sample param files below. They are just shown in this page for reference. Instead, pull from our deployment scripts git repository for a particular release you are deploying (ex. release/1.5).

The templates shown below were last updated for version 2.0



Edit the main parameter fileYou can find a template in the repository (gobiideployment/params/template_main.parameter). It will contain all the topology information and deployment credentials. The template is shown below, with each parameter explained on top of the corresponding line: 



All the passwords and some usernames have been omitted on the parameter file templates in this page for security. Make sure you check Default Credentials [CONFIDENTIAL] to replace the parameters with the correct values. If you can't access the page with the default credentials, contact Kevin Palis or Roy Petrie.



Version 2.0+; dockerhub access

As of version 2.0 the container repos still exist under the user gadm01 but you cannot uploaded to it. This was done for security and maintenance purposes.

Please use gadmreader to pull an images from the gadm01 account.



template_main.parameters

This template has been updated with the most recent parameters as of Version 2.0. For ease of use, the template has been expanded with 'white' space between parameters to allow for a more readable and more easily editable structure.

Additionally, this file has taken the most recent parameters added at the bottom for any version deployment beyond the last so as to allow for easy copy and paste into existing parameter files.

As of version 2.0 any password set within the *main.parameters file set to "askme" have been configured to request the user password during script deployment. The request and password is hidden during deployment to keep the visibility and clear text passwords to a minimum.

If a password is set, the script will continue without prompting for the user pass.







You can name this file however you want. The full file path is passed to the deployment script.

For anything not clear or if you're not sure of what to put on a parameter's value, please ask Kevin Palis.

If a seed context for your crop is not available and you would like to have one (ex. seed_crop3), please contact Roy Petrie or Kevin Palis.



Edit the ‘install’ parameter file. You can find a template in the repository (gobiideployment/params/template_install.parameter). It will contain all the GOBii instance's configuration (i.e. runtime configuration via the gobii-web.xml content). The template is shown below, with each parameter explained on top of the corresponding line:



template_install.parameters

For ease of use the template has been expanded with 'white' space between parameters to allow for a more readable and more easily editable structure.



As of version 1.3, the *install.parameter file is also passed as a parameter to the main call to the gobii_ship scripts. Hence, it is not necessary to be set in the CONFIGURATOR_PARAM_FILE of the *main.parameter file.

For anything not clear or if you're not sure of what to put on a parameter's value, please ask Kevin Palis.





Running the Deployment Script



This script should not be run using sudo or as the root user. Some commands will automatically prompt you if they need elevated permissions.



If you are deploying GOBii into just one machine, you run ‘the_gobii_ship.sh’ to pull, deploy, and configure all 3 docker containers to one target server. To do so, you run a command similar to:



If you made a mistake and want to start over or if there are other dockers in the server you want to get rid of, do a cleanup by running docker stop, rm, and rmi.



The GOBii Ship...

As of version 2.0 it is only required to run one script.





This deployment script will be updated if new containers need to be deployed along side of GDM but now has the ability to deploy full deployment and call each script in the proper order for deployment or each container individually.

This was updated to make sure the scripts did not need to be maintained in two places when configuration or updates where done to pre-existing containers.

On deployment vim is now installed on web, db, compute and kdc nodes. This will eventually be built into the dockerhub containers in the repo but for the time being they will get vim the traditional way.





LDAP Install Cert

After running the deployment scripts and doing the verification step #1 below, turn on LDAP authentication, if it wasn't already. (details on verification step #2). When turning on LDAP, make sure that the LDAP certificate is loaded to the JVM. You can do so by:



You should see a confirmation message saying "certificates added to keystore". Finally, restart Tomcat, making sure it runs via user gadm:



As you can see, /data/cacart_mgs1.der is the certificate file. The command above looks for it in the web docker home volume, which is /data. So, wherever that volume points to in the host server, make sure you put the DER file there first before running the command. Lastly, the paths to keytool and cacerts will most likely stay the same as we're only distributing dockers based on Ubuntu, but in case in the future we offer other linux flavor, or JVM changes, then those paths may change.



Make sure that the ports you assigned the dockers with (typically 8081, 8083, 8084, 5433, and 2222) are open. Otherwise the containers won't be able to communicate with each other and will fail with internal server error 500 (although in the future we may have more specific error message). The more specific error message will be shown in Tomcat's log (catalina.out). Opening a port will differ from OS to OS. 



Example: CentOS 6 and 7



Make sure that the /data symlink in the web node was created for the LoaderUI to work properly. If not, do the following:

If run in the same terminal session you ran gobii_ship*.sh scripts, then run this as is, the $BUNDLE_PARENT_PATH variable should be there. If not, replace it with that parameter's value in the *_main.parameters file that you have.



If it is not possible for the /data symlink to be created because in the target host, /data is already a directory or a drive mount point, create a symlink manually from /data/gobii_bundle to point to $BUNDLE_PARENT_PATH/gobii_bundle -effectively still making /data/gobii_bundle point to the correct location.











This is a script called at the start of the deployment to verify if the system is going to be WIPED of data both the files associated with the DB as well as the database volumes are removed but it also makes sure you verify multiple times!









Additional Scripts





This script has been built into the_gobii_ship.sh and it is recommended to have this running, but since its creation the .jar used within the script has been moved from its original location.



This jar can be used and processed manually until the jar has been replaced into the gobiideployment repo.



The kdc_passwd.sh was built to help update the kdc admin password.











The Non-Destructive Deployment architecture must be in place in order to run this script.

  • Additionally this script requires the *main.parameters to be passed into it with the new parameter added into the parameters file



This script catches the directory locations and and files existences. Depending on the existence and current link status the script will make gobii-web.xml backups for use later, if needed, and will be sure to not remove and files or directories in persistent_data to prevent data loss.

















Livelinks are links sent within the notification emails for loads and extracts that point to the files location within the owncloud file browser





When running the following line to enable livelinks, it will need to be run from the /data/gobii_bundle/config directory









[Deprecated] Restoring backup data



Due to the implementation of Non-Destructive deployment the restore process is no longer needed.



This section details the data restoration process used after a backup and deployment has completed.



Simply run the restore script with the correct parameters: GOBII Add-on Scripts#onScripts-RestoreDataBundlefromBackup



Go to the link above as the syntax in running these scripts have changed slightly from version 1.4.



Verify that the data was restored by opening any crop's ExtractorUI. You should see previously loaded datasets.

  • OPTIONAL: If you have a KDCompute Docker container running before, restore its files from backup simply by copying the TestOutput_UserDirs subdirectory of kdcompute_file_storage backup to /data/kdcompute_file_storage/TestOutput_UserDirs









[Deprecated] Configure Timescope

This section has been deprecated as the process has been built into the deployment scripts.  This section will remain on this version deployment documentation for future deployment references.

As of version 1.5, we are adding a new web application called "Timescope". This will allow users to browse and delete data permanently from the database.

With this, there are additional steps that needs to be done – but only needs to be done once (i.e. if you upgrade to any version >1.5 in the future you won't need to do the following anymore).



[Deprecated] Creating Timescope User

This process should not be needed as the 'timescoper' user is already built into the deployed DB. This section is being kept for future references.

As of version 1.5, we are adding a new web application called "Timescope". This will allow users to browse and delete data permanently from the database.

With this, there are additional steps that needs to be done – but only needs to be done once (i.e. if you upgrade to any version >1.5 in the future you won't need to do the following anymore).

  1. Create the database user for timescope:

    1. SSH into the database node, then go into the database docker container via:



    2. Switch to the postgres user





    3. Create the timescoper db user







 

Layered System Architecture

Created by 

Yaw Nti-Addae

Jun 26, 2019

Analytics

 

This architecture stack is for batch operations. Metadata size and genotype can easily get too large for conventional data loading to handle. The main differences of this stack from the "general" architecture are the data access layer and the business layer. The digester serve as the business layer. It will convert whatever input files (raw files like hmp, csv, etc + instruction files from the presentation layer) to a format that the data access layer will understand for loading (IFL). It is also responsible for giving the instructions on what( information-add) to extract and pass (out -)them to the metadata extractor (MDE). The data access layer here is broken into two parts based on functionality. IFL is for batch loading data to the different data stores while MDE is for extracting data in batches and writing them to files. You can also think of IFLs and MDEs as including the functions provided to load and extract the genotype matrix from HDF5/MonetDB. The whole communication line of the digesters and the data access layer is facilitated by cron jobs (as indicated in the gear icons below).

  1.  

    1.  



    2. Modify the timescope config file ( /usr/local/tomcat/webapps/timescope/WEB-INF/classes/config.properties), replacing the credentials with the ones you made in step 1c.





    3. Reload the webapp via Tomcat Manager

      1. Open your browser and go to <web_node_url>:<web_node_port>, then click on Tomcat Manager

      2. You should see timescope from the link, click "reload":

To verify that Timescope is properly deployed: Open your browser and navigate to <web_node_url>:<web_node_port>/timescope. Upon initial install, there will only be one superuser account in your Timescope database. The credentials are in this page: Default Credentials. When you first log in, please change this password using the Timescope UI for security. If you cannot access it, contact either Kevin Palis or Roy Petrie.

A few things to take note regarding Timescope:

  • You will need to create accounts (using the User tab) for everyone who needs to access Timescope

    • You need to assign temporary passwords for each user and ask them to change it upon log in. There is no mandatory password change feature (yet).

  • Each crop database's user management for Timescope are separate. i.e. you can have one user added to maize but not to wheat, but also if another user needs to be on both, you'll have to add the user manually to both crops

  • You only need to provision accounts once. Future deployments will always preserve postgres data – as long as the Docker volumes don't get deleted.



Timescope Verification

To verify that Timescope is properly deployed: Open your browser and navigate to <web_node_url>:<web_node_port>/timescope. Upon initial install, there will only be one superuser account in your Timescope database. The credentials are on this page: Default Credentials. When you first log in, please change this password using the Timescope UI for security. If you cannot access it, contact either Kevin Palis or Roy Petrie.

A few things to note regarding Timescope:

  1. You will need to create accounts (using the User tab) for everyone who needs to access Timescope

    1. You need to assign temporary passwords for each user and ask them to change it upon login. There is no mandatory password change feature (yet).

  2. Each crop database's user management for Timescope are separate, i.e., you can have one user added to maize but not to wheat, but also if another user needs to be on both, you'll have to add the user manually to both crops.

  3. You only need to provision accounts once. Future deployments will always preserve postgres data – as long as the Docker volumes don't get deleted.









GOBii Portal

This section shows the portal that links all products and features with GDM. 



Post initial deployment is recommended to back up current launchers.xml file used within web-node and replace after deployment.









The *_main.parameters will need new lines indicating the name of the new crops. These parameters can be anywhere within the *_main.parameters file.



Location: xml_config_parser.py



Location: launchers.xml



Deploying more than one crop or adding links into the portal utilizes /data/gobii_bundle/config/utils/xml_config_parser.py which changes and updates the /usr/local/tomcat/webapps/gobii-portal/config/launchers.xml

During deployment, the script configures the original crop for crop 1 noted in the parameters file but additional crops and links will need to be added by adding the following to the script or manually running the following commands.

Example below is defaulted in the GOBii web script template for adding portainer to the deployment.



If the configurations need to be changed and scripts are erroring you can add the above configurations manually. The webpage will update dynamically.









ownCloud

This section will show the setup and configuration required post deployment. This section assumes the container was deployed but the LDAP, Storage and Shares have not been configured.

After ownCloud deployment login with ownCloud default user and pass. This user and pass will have to be updated by the deploying system administrator as the user and password are stored and salted in the DB.



Once logged in, select user name "Admin" > "Settings" > On left panel, under Admin, select "User Authentication". The configuration on "Server " tab will show the configurations made in the *_main.parameters file. If the configurations were correct at deployment at the bottom will show "".

If the below screenshot shows "" instead, update the configurations within this tab until it shows OK for your authentication configuration.



LDAP Certificates

If using a certificate the configuration will show "OK" once it’s properly setup but will fail to return any users or groups. Though within the "Login Attributes" tab a username can be verified even without the certificate but this is the extent until the certificate is added to the container.

On deployment, the /data directory is mounted to the ownCloud container. Place the certificate anywhere within /data then copy to the /var/www/owncloud directory. The system should pick this up on the next attempt to authenticate.



ownCloud Active Directory Configuration

ownCloud works well with LDAP but needs additonal settings configured for systems using Active Directory. Within the "Expert" tab at the right settings for Internal Username and UUID may need to be updated.



  • Select "Enable external storage"

  • Enter the folder name for logs and crops

  • Select sftp

  • Select username and password

  • Under "Configuration"

    • <hostname or IP>

    • directory location on host [i.e. /data/gobii_bundle/logs]

    • gadm username

    • gadm password

  • Within "Available for" if there is no user or group added these mounts are available for all users.

    • GOBii suggested configuration is to add only local owncloud "admin" user to lock down the access of these to only the admin user

Verify under the gear icon the "Enable Preview" and "Enable Sharing" are checked



Sharing External Storage with Users

  • Find the directories in admin home > select ellipsis > select "Details"

  • Select "Sharing" > Under User and Groups enter the "GOBii" group and select

  • Select the down arrow > uncheck

    • can share

    • can edit

    • create change delete

This will allow the GOBii group to see and use the files and directories shared but will be unable to edit or change them.


Enabling File Scan

Add the following line into the root cronjobs within the ownCloud container. The files available when shared, will not be updated afterwards unless the following line is added to preform an ownCloud file scan to pick up the new files.



:docker exec gobii-oc-node bash -c 'occ user:sync "OCA\User_LDAP\User_Proxy" -m disable -r'



Disable File Locking

Make sure to follow this process in both config.php and overwrite.config.php

Add or update the following files with the subsequent configurations. This will make sure files are not locked when accessing and performing scans which the cronjobs can overlap can cause issues within the system running multiple overlapping filescans over the full system. This is highly recommend for the systems that use large files.



config/config.php



config/overwrite.config.php



Enable local file mounts

Enable this in the configurations to allow for local system mounting of files. This will allow for the files and volumes mounted into the container to be accessed and mounted for file sharing.

A common error encountered: the owncloud instance will be unable to raise permissions when attempting to share a local mounted file or directory. This can be fixed by increasing the permissions of the file or directory.



It is recommended to use the following command against the shared directories:



These permissions can be dropped to acceptible levels once the locations have been shared within owncloud.



Within the configuration, update and add the following line:



config/config.php









Portainer

Portainer is a container that sits on a system and monitors all docker/container information. This system can monitor multiple end points by deploying the sherpa container opening any specific port. This allows for the portainer container to access and monitor all containers on a remote system.



Portainer Initial Login

On the initial login, portainer will request the admin to set up the password.

Portainer holds configurations under the /data directory. If the system is removed and redeployed the same configurations will remain as the portainer files within /data are not removed.





Select "Local" > Select "Connect"

this will allow for local monitoring and allow for adding remote endpoints to be monitored post deployment.



Adding Sherpa Agent Node

Select "Endpoints" in left panel > Select "Add endpoint"



Add the configurations for the sherpa node under "Environment Details":

  • Name

  • Enpoint URL

  • Public IP



During testing of portainer the latest has problems adding endpoints and would fail to add with a very undescriptive error. This error only occurred when attempting to connect ubuntu 16.04 server VMs together with latest portainer and latest sherpa on both of these versions of portainer/sherpa and Ubuntu 16.04 OS on a VM. This error was not seen between

  • CentOS to CentOS

  • Ubuntu to CentOS

  • CentOS to Ubuntu





Deploying Sherpa Agent Container



Sherpa opens the contain port for external access but is limited to the specified networks setup in the parameters:

The portainer container will be unable to monitor the remote host unless the communication to specified port is specified.



Deploying Sherpa via GOBii scripts

  • Verify all parameters are updated for the sherpa agent

  • To deploy the the_gobii_ship.sh and select the sherpa agent

Deploying Sherpa manually

  • Using the configurations specific to the environment run the following command to deploy the sherpa remote agent container: [Settings below are defaulted for local access]

    • Network rules syntax: 10.0.0.0/24

    • Port: This can be any port. Portainer defaults to 2375 and GOBii normally uses 4550







Post Deployment Verification [Smoke Testing]

This section is large enough it warrants its own document.  Please follow the link below to get the version deployment Smoke Testing documentation.

DevOps Smoke Testing Process