Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Changed name from Pedigree Verification Tool to CAST

...

The Pedigree Verification Consensus And Splitting Tool will allow you to select a dataset from GOBii, apply some basic filtering, carry out consensus calling of replicate samples, split the file by sample metadata, and create a project file for analysis in Flapjack. Details of these steps are provided below. Note: currently the pipeline is for 2 letter nucleotide datatypes only.

Table of Contents

Login

First, select the url for environment that you would like to pull data from. The environment will look something like http:\\api.gobii.org:8081\gobii-dev\

...

Log in to the GOBii-GDM system using your GOBii-GDM credentials.

...

Get Data

This page uses BrAPI calls to select a dataset from the GOBii-GDM database.

...

Table 1. Example metadata used in the GOBii Ped Ver Tool CAST

Germplasm_name

dnarun_name

dnasample_num

germplasm_par1

germplasm_par2

germplasm_pedigree

dnasample_group

cross1_p1/p2

cross1_p1/p2-sample1

1

p1

p2

p1/p2

cross1

cross1_p1/p2

cross1_p1/p2-sample2

2

p1

p2

p1/p2

cross1

cross1_p1/p2

cross1_p1/p2-sample3

3

p1

p2

p1/p2

cross1

cross2_p3/p4

cross2_p3/p4-sample1

4

p3

p4

p3/p4

cross2

cross2_p3/p4

cross2_p3/p4-sample2

5

p3

p4

p3/p4

cross2

cross2_p3/p4

cross2_p3/p4-sample3

6

p3

p4

p3/p4

cross2

p1

p1-sample1

7

p2

p2-sample2

8

p3

p3-sample3

9

p4

p4-sample1

10

p4

p4-sample2

11

Filter Data

Use this page to filter your data based on marker and sample percent data.

  • For ‘Marker Percent’: enter a percent value and your data will be filtered to only include markers with greater than this percent of data

  • For 'Sample Percent': enter a percent value and your data will be filtered to only include samples with greater than this percent of data

  • Note: the percent values are based on the original, unfiltered data matrix and are not recalculated following removal of markers or samples

  • Select APPLY. The number of markers and samples remaining after filtering are summarized on the top right of the page. A preview of the filtered data can also be seen on the page. You can click through the preview pages to see more of the file. The filtered file can be downloaded if desired

...

Consensus Call Data

  • Consensus calling is currently only available using the algorithm ‘majority genotype (favoring homozygotes)’. This means that only parent replicate samples are consensus called. The parent samples are identified by the fields germplasm_par1 and germplasm_par2 which reference the germplasm_name of the parents

  • The term ‘favoring homozygotes' means that if there are equal frequencies of homozygous and heterozygous genotype calls, then the homozygous genotype is called. For example: if there are replicate samples AA AA AT and AT, then the consensus call will be AA. However, if there are equal frequencies of homozygous calls, eg replicate samples AA AA TT and TT, then the consensus call will be NN or missing, as there is an equal tie between two different homozygous calls. Equal frequencies of homozygous calls will take precedence, so that if replicate samples are AA, AT and TT, the consensus call will be NN.

  • A consensus threshold can be optionally applied if more stringency in the consensus calling is needed, e.g. if the user wants at least 50% of one call to be observed. For example, if the replicate samples are AA AA AT AT TT, and a 50% consensus threshold is applied, then the consensus call will be an NN, as less than 50% of the calls are AA. However, AA AA AA AT and TT will return a consensus call of AA as now more than 50% of the calls are AA

  • Select ‘APPLY’. A preview of the consensus calling can be viewed in the screen. Each consensus called parent can be selected from the drop-down menu to see the contributing replicate sample calls. The consensus calls can be edited if the user does not agree with the calls.

  • To see all the consensus calls, click on ‘Download’ to view the consensus calling in Excel

...

Split Data

The dataset can be split into multiple datasets for downstream analysis using any of the Available Split Categories.

  • Drag and drop the category that you want to split data by from the ‘Available Split Category' to ‘Selected Split Columns’

  • You may want to select more than one split category. For example: if your data needs to be split by a combination of parents identified in germplasm_par1 and germplasm_par2 fields

  • Enter ‘Apply’ to split your data. A message will show as ‘Successful’ when data is successfully split

  • You will see a summary of the number of split datasets that have 2 parents and can be analyzed in Flapjack using pedigree verification. Datasets that do NOT have 2 parents will not be included in the project file.

  • Note: the parents of split datasets will be automatically pulled into the first two rows of each dataset according to the parentage defined in germplasm_par1 and germplasm_par2 fields.

...

Export Data

This page shows a summary of the actions taken by the user including:

...