Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Select a Study (optional). ‘Study’ is a BrAPI term that is equivalent to GOBii’s Experiment. The datasets will now be filtered by the selected Studystudy/Experimentexperiment. Note that you don’t HAVE to select a study. You can proceed to selecting a Variantset if you can easily find it in the variantset listvariantset without selecting a study.

  • Select a Variantset (required). Variantset is a BrAPI term equivalent to GOBii’s Dataset. You In brackets, afgter the variantset name, you will see a count of the number of markers x the number of samples at the end of the variantset name in brackets.

  • Select a Mapset (optional). Note, that the mapsets will are NOT be filtered by the dataset that you selected markers in the dataset/variantset (we could do this but it would be a veeery very long query and we thought you might not want to likely not worth the wait). So, you will need to KNOW ahead of time which mapset is relevant to the dataset selected. Note-2, you do not have to select a mapset. Flapjack handles not having a mapset quite well, and simply orders the markers as they are received at 1 cM intervals onto a ‘synthetic’ map.

  • In the future, we will enable more than one study and dataset to be selected and the resulting data will show the union of datasets selected. However, only one map can be selected!.

  • Select the ‘eye’ icon next to the variantset selection to preview the dataset selected.

  • To remove a selection, click on the x next to the study, variantset, or mapset name.

  • Note: The associated metadata for the samples will also be utilized in the downstream steps of this tool as follows;

    • The most important metadata are the parents of the F1 samples: make sure to enter the germplasm_name of parents in the germplasm_parent1 and germplasm_parent2 fields of GOBii-GDM for each F1 sample. This will identify the correct parents against which the F1 samples are compared

    • There may be multiple samples of each germplasm_name and these can be consensus called to give a single parental genotype against which the F1 samples are compared. If you don’t apply consensus calling, and there is more than one parent replicate sample in Flapjack, the user will need to manually select which parent to reference and will not be able to take advantage of the automated batch analysis of multiple datasets

    • Several fields for the sample data can be used to split the genotyping data into different datasets (e.g. sample_group and sample_group_cycle, pedigree, germplasm parent 1 and germplasm parent 2 etc.). Enter these fields when loading data to GOBii-GDM.

    • In the future, we will allow metadata for samples to be uploaded to the tool in case these data were not loaded to GOBii or are stored in a different database

...

  • The dnarun_name is the sample name that you see associated with the genotyping data matrix in the tool preview

  • The germplasm_par1 and germplasm_par2 fields identify the parents by their germplasm_name

  • The file can be split into 2 separate datasets for analysis based on any of the following fields: germplasm_name, germplasm_pedigree, dnasample_group, or a combination of the germplasm_par1 and germplasm_par2 fields

  • Consensus calling will be carried out on parents with the same germplasm name; in this case only p4 has replicate samples with the same germplasm_name. Consensus called parents will be indicated by a star in the output file eg p4*

...

Use this page to filter your data based on marker and sample percent data.

  • Select For ‘Marker Percent’: enter a percent value and your data will be filtered to only include markers with greater than this percent of data (i.e. non-missing values)Select

  • For 'Sample Percent': enter a percent value and your data will be filtered to only include samples with greater than this percent of data (i.e. non-missing values)

  • Note: the percent values are based on the original, unfiltered data matrix and are not recalculated following removal of markers or samples

  • Enter Select APPLY. The number of markers and samples remaining and excluded after filtering are summarized on the top right of the page. A small preview of the filtered data can also be seen on the page. You can click through the preview pages to see more of the file. The filtered file can be downloaded if desired

...

  • Consensus calling is currently only available using the algorithm ‘majority genotype (favoring homozygotes)’. This means that only parent replicate samples are consensus called. The parent samples are identified by the fields germplasm_par1 and germplasm_par2 and by which reference the germplasm_name of the parents

  • The term ‘favoring homozygotes' means that if there are equal frequencies of homozygous and heterozygous genotype calls, then the homozygous genotype is consensus called. For example: if there are replicate samples AA AA AT and AT, then the consensus call will be AA. However, if there are replicate samples AA AA TT and TT, then the consensus call will be NN or missing, as there is an equal tie between two different homozygous calls

  • A consensus threshold can be optionally applied if more stringency in the consensus calling is needed, e.g. if the user wants at least 50% of one call to be observed. For example, if the replicate samples are AA AA AT AT TT, and a 50% consensus threshold is applied, then the consensus call will be an NN, as less than 50% of the calls are AA. However, AA AA AA AT and TT will return a consensus call of AA as now more than 50% of the calls are AA

  • After selections are made for consensus calling, the user should enter Select ‘APPLY’. A preview of the consensus calling can be viewed in the screen. Each consensus called parent can be selected from the drop-down menu to see the contributing replicate sample calls. The consensus calls can be edited if the user does not agree with the calls.

  • To see all the consensus callingcalls, click on ‘Dowmload’ ‘Download’ to view the consensus calling file in ExcelIn the future we will enable consensus called genotypes to be uploaded again, in case editing of consensus calling is required

Split Data

The dataset can be split into multiple datasets for downstream analysis using any of the Available Split Categories.

  • Drag and drop the category that you want to split data by from the ‘Available Split CatagoryCategory' to ‘Selected Split Columns’

  • You may want to select more than one split category. For example: if your data needs to be split by a combination of parents identified in germplasm_par1 and germplasm_par2 fields

  • Enter ‘Apply’ to split your data. A message will show as ‘Successful’ when data is successfully split

  • You will see a summary of the number of split datasets that have 2 parents and can be analyzed in Flapjack using pedigree verification. Datasets that do NOT have 2 parents will not be included in the project file.

  • Note: the parents of split datasets will be automatically pulled into the first two rows of each dataset according to the parentage defined in germplasm_par1 and germplasm_par2 . For subsequent batch processing to be successful, all samples within a single split datasets should have the same two parents, as seen in the example file above. If the two parents are not available in the dataset, then subsequent analysis for F1 pedigree verification will not be possiblefields.

Export Data

This page shows a summary of the actions taken by the user including:

  • Enter Select ‘Download Flapjack File’ . To to download the split dataset in a Flapjack project file format

  • Clicking on Select the downloaded file will to automatically open the project file in Flapjack, where you will see the results of your consensus calling and splitting. The splitting should create a separate Flapjack Data Set You will see each split data set listed on the left hand side of the screen for each set of data that needs to be analyzedFlapjack application. The parents should be placed consensus called parents are automatically positioned at the top of each of the datasets dataset with consensus calling having been applied for any replicate parent samples. You are now ready for batch analysis for F1 pedigree verification. See the Flapjack help menu for more details.