Haplotype Tool Requirements
Created by Liz, last modified about 3 hours ago
Done | Feature | Notes | Required for v1.0 | |
---|---|---|---|---|
Done | Feature | Notes | Required for v1.0 | |
Imputation | Imputation tool would be separate - looking at stitch on R server, looking at Impute and Beagle - NO - MAY BE ABLE TO DO THIS IN THE TOOL Use KDCompute for imputation? Let's meet with Andrew to discuss options for KDCompute | |||
Data import | Choose csv file, or transposed hapmap in GOBii format (with map positions) - DONE | confirm GObii hapmap file is compatible | ||
Data import | Eventually be able to pull GOBii datasets with a BrAPI call (same as Flapjack has) | |||
Data import | Select positions to focus on - chr and position start and stop - maybe move this to filtering section
| yes cM based distances | ||
Data import | Be able to combine samples from two datasets (merge by marker name) into the same analysis, or import a new dataset and have as a seperate tab for analysis Josh: I could imagine looking at two sets of lines at the same time if I've decided to bring new material into the breeding program and I want to see how the haplotypes compare in-state to the haplotypes already in the breeding program. perhaps it would be interesting to see if the new lines shared any potential co-ancestry or offer really novel haplotypes at regions where the breeding program doesn't have much diversity. | |||
FILTERING | Add a button for considering hets in clustering or not - option to change all hets to missing | yes | ||
FILTERING | Have a button to exclude markers with a certain MAF - DONE | |||
FILTERING | Add ability to exclude a haplotype group from being seen or used in analysis | |||
FILTERING | Have a 'select sub-set of markers' window and/or be able to select a region sub-set | |||
FILTERING | Allow filtering by Samples | yes | ||
FILTERING | Have different windows for each region selected [ i.e. allow user request several non-contiguous genomic regions, and show them separately? ] | |||
FILTERING | Be able to save filtered datasets | yes | ||
Haplotype analysis view | Have an automatic assignment of haplotype number - Abhi suggested CCC? | |||
MAIN DISPLAY |
| |||
INTERACTIVITY | Drop down with sample names and marker names to highlight sample/s marker/s OR click on something in the view to see the sample or marker name. If we want interactivity R doesn't provide it - need Java script. Will continue in R for now. - DONE Can there be a zoom to see different groups, or can there be a list of groups and how many samples are in each group, with a check box to select? | yes | ||
Trait / Marker design capabilities |
Have user classify haplotype groups and give a trait name eg resistant or susceptible. Then search a window and find perfectly diagnostic markers. | yes | ||
Add a button to match to a specific line if don't have ref alleles (or if requested by user). | yes | |||
Additional requests from Josh | COMPARING TWO REGIONS Josh: In terms of looking at two regions in the same panel of lines, I might be looking to figure out how often two regions are or are not in LD with each other in the breeding program. Perhaps that would be helpful in picking parents that are otherwise equal by phenotype but offer different genetics. Here's a screen grab from excel that clarifies what I'm thinking about when I say I want to define windows in a region and look at how the haplotypes fall out. Windows are based on a 'window size' determined by the user and split evenly across the region. Or perhaps if you want to get really fancy, the tool could determine the windows using LD blocks in the sequence data.
It would also be helpful to look at a whole chromosome partitioned into bins or windows and for each window have a bar that indicates the the number of haplotypes in that window. So the whole chromosome looks something like a sideways bar chart. Damien did this for me once (see following images) using sequence data he has for his QTL deployment work, and a greedy algorithm that called groups as different even if they were only different due to hets/missing data/only one SNP. Dima's clustering is much better. But you may find it useful to chat with him as he's thought a little about this before.
This view would also require a more automated script to determine the ideal number of groups in each window that makes sense rather than having the user set it for every window. Perhaps that's easier after filtering for missing data and hets. For my purposes, I would look at the whole chromosome or the whole genome of a set of selected parents to get an idea of the haplotypic diversity. I might see a region where genetic variation is low, and investigate further. Maybe there's a trait marker there. But maybe it's drift. So I may choose a few additional parents that balance the haplotypic diversity in the region better to avoid loss of genetic diversity due to drift.
I could imagine the bars could be color coded to indicate the frequency of each haplotype in a window something like this: | |||
Export | Be able to export datasets that have filtered on; export alleles and haplotypes
| yes | ||
LikeBe the first to like this
No labels
Write a comment…
Powered by a free Atlassian Confluence Open Source Project License granted to Genomic and Open Source Breeding Informatics Initiative. Evaluate Confluence today.
Powered by Atlassian Confluence 6.10.0