/
Haplotype Tool Requirements

Haplotype Tool Requirements

Created by Liz, last modified about 3 hours ago



Done

Feature

Notes

Required for v1.0



Done

Feature

Notes

Required for v1.0



Done

Feature

Notes

Required for v1.0





Imputation

Imputation tool would be separate - looking at stitch on R server, looking at Impute and Beagle - NO - MAY BE ABLE TO DO THIS IN THE TOOL

Use KDCompute for imputation? Let's meet with Andrew to discuss options for KDCompute







Data import

Choose csv file, or transposed hapmap in GOBii format (with map positions) - DONE

confirm GObii hapmap file is compatible





Data import

Eventually be able to pull GOBii datasets with a BrAPI call (same as Flapjack has)







Data import

Select positions to focus on - chr and position start and stop - maybe move this to filtering section

  1. What about loci with no positions - will not be included in the filtering

  2. Need to be able to accommodate physical AND genetic map positions

yes

cM based distances





Data import

Be able to combine samples from two datasets (merge by marker name) into the same analysis, or import a new dataset and have as a seperate tab for analysis

Josh: I could imagine looking at two sets of lines at the same time if I've decided to bring new material into the breeding program and I want to see how the haplotypes compare in-state to the haplotypes already in the breeding program.

perhaps it would be interesting to see if the new lines shared any potential co-ancestry or

offer really novel haplotypes at regions where the breeding program doesn't have much diversity. 







FILTERING

Add a button for considering hets in clustering or not - option to change all hets to missing

yes





FILTERING

Have a button to exclude markers with a certain MAF - DONE







FILTERING

Add ability to exclude a haplotype group from being seen or used in analysis







FILTERING

Have a 'select sub-set of markers' window and/or be able to select a region sub-set







FILTERING

Allow filtering by Samples

yes





FILTERING

Have different windows for each region selected

[ i.e. allow user request several non-contiguous genomic regions, and show them separately? ]







FILTERING

Be able to save filtered datasets

yes





Haplotype analysis view

Have an automatic assignment of haplotype number - Abhi suggested CCC?







MAIN DISPLAY





 







INTERACTIVITY

Drop down with sample names and marker names to highlight sample/s marker/s OR click on something in the view to see the sample or marker name. If we want interactivity R doesn't provide it - need Java script. Will continue in R for now. - DONE

Can there be a zoom to see different groups, or can there be a list of groups and how many samples are in each group, with a check box to select?

yes





Trait / Marker design capabilities

  • How to select a better associated marker? Have user classify haplotype groups a- give a trait name eg resistant or susceptible. Then search a window and find perfectly diagnostic markers.

  • Add phenotypic score/s and marker scores as separate columns

    Be able to sort the phenotypic data and marker call (option to color code haplotype groups first), and then resort back to haplotype group

    Box plots of trait values by haplotype group

    Add a trait marker or haplotype tag. Create a separate column for the marker or haplotype (don't try to highlight the region - the marker may not even be in the dataset). GOBii has a maker group extract that can be used to provide a marker and favorable allele for a set of samples

Have user classify haplotype groups and give a trait name eg resistant or susceptible. Then search a window and find perfectly diagnostic markers.



yes







Add a button to match to a specific line if don't have ref alleles (or if requested by user).
(Default behaviour - to color by major-minor allele, based on observed data.)

yes





Additional requests from Josh





COMPARING TWO REGIONS

Josh: In terms of looking at two regions in the same panel of lines, I might be looking to figure out how often two regions are or are not in LD with each other in the breeding program. Perhaps that would be helpful in picking parents that are otherwise equal by phenotype but offer different genetics. 

Here's a screen grab from excel that clarifies what I'm thinking about when I say I want to define windows in a region and look at how the haplotypes fall out. Windows are based on a 'window size' determined by the user and split evenly across the region. Or perhaps if you want to get really fancy, the tool could determine the windows using LD blocks in the sequence data. 

 





It would also be helpful to look at a whole chromosome partitioned into bins or windows and for each window have a bar that indicates the the number of haplotypes in that window. So the whole chromosome looks something like a sideways bar chart. Damien did this for me once (see following images) using sequence data he has for his QTL deployment work, and a greedy algorithm that called groups as different even if they were only different due to hets/missing data/only one SNP. Dima's clustering is much better. But you may find it useful to chat with him as he's thought a little about this before.

 

This view would also require a more automated script to determine the ideal number of groups in each window that makes sense rather than having the user set it for every window. Perhaps that's easier after filtering for missing data and hets. For my purposes, I would look at the whole chromosome or the whole genome of a set of selected parents to get an idea of the haplotypic diversity. I might see a region where genetic variation is low, and investigate further. Maybe there's a trait marker there. But maybe it's drift. So I may choose a few additional parents that balance the haplotypic diversity in the region better to avoid loss of genetic diversity due to drift. 

 



I could imagine the bars could be color coded to indicate the frequency of each haplotype in a window something like this: 







Export

Be able to export datasets that have filtered on; export alleles and haplotypes

 

yes































































LikeBe the first to like this

Write a comment…

Powered by a free Atlassian Confluence Open Source Project License granted to Genomic and Open Source Breeding Informatics Initiative. Evaluate Confluence today.

Atlassian