Haplotype Tool Requirements

Created by Liz, last modified about 3 hours ago

Done	Feature	Notes	Required for v1.0

Done	Feature	Notes	Required for v1.0
Done	Feature	Notes	Required for v1.0
	Imputation	Imputation tool would be separate - looking at stitch on R server, looking at Impute and Beagle - NO - MAY BE ABLE TO DO THIS IN THE TOOL Use KDCompute for imputation? Let's meet with Andrew to discuss options for KDCompute
	Data import	Choose csv file, or transposed hapmap in GOBii format (with map positions) - DONE	confirm GObii hapmap file is compatible
	Data import	Eventually be able to pull GOBii datasets with a BrAPI call (same as Flapjack has)
	Data import	Select positions to focus on - chr and position start and stop - maybe move this to filtering section What about loci with no positions - will not be included in the filtering Need to be able to accommodate physical AND genetic map positions	yes cM based distances
	Data import	Be able to combine samples from two datasets (merge by marker name) into the same analysis, or import a new dataset and have as a seperate tab for analysis Josh: I could imagine looking at two sets of lines at the same time if I've decided to bring new material into the breeding program and I want to see how the haplotypes compare in-state to the haplotypes already in the breeding program. perhaps it would be interesting to see if the new lines shared any potential co-ancestry or offer really novel haplotypes at regions where the breeding program doesn't have much diversity.
	FILTERING	Add a button for considering hets in clustering or not - option to change all hets to missing	yes
	FILTERING	Have a button to exclude markers with a certain MAF - DONE
	FILTERING	Add ability to exclude a haplotype group from being seen or used in analysis
	FILTERING	Have a 'select sub-set of markers' window and/or be able to select a region sub-set
	FILTERING	Allow filtering by Samples	yes
	FILTERING	Have different windows for each region selected [ i.e. allow user request several non-contiguous genomic regions, and show them separately? ]
	FILTERING	Be able to save filtered datasets	yes
	Haplotype analysis view	Have an automatic assignment of haplotype number - Abhi suggested CCC?
	MAIN DISPLAY
	INTERACTIVITY	Drop down with sample names and marker names to highlight sample/s marker/s OR click on something in the view to see the sample or marker name. If we want interactivity R doesn't provide it - need Java script. Will continue in R for now. - DONE Can there be a zoom to see different groups, or can there be a list of groups and how many samples are in each group, with a check box to select?	yes
	Trait / Marker design capabilities	How to select a better associated marker? Have user classify haplotype groups a- give a trait name eg resistant or susceptible. Then search a window and find perfectly diagnostic markers. Add phenotypic score/s and marker scores as separate columns Be able to sort the phenotypic data and marker call (option to color code haplotype groups first), and then resort back to haplotype group Box plots of trait values by haplotype group Add a trait marker or haplotype tag. Create a separate column for the marker or haplotype (don't try to highlight the region - the marker may not even be in the dataset). GOBii has a maker group extract that can be used to provide a marker and favorable allele for a set of samples Have user classify haplotype groups and give a trait name eg resistant or susceptible. Then search a window and find perfectly diagnostic markers.	yes
		Add a button to match to a specific line if don't have ref alleles (or if requested by user). (Default behaviour - to color by major-minor allele, based on observed data.)	yes
	Additional requests from Josh	COMPARING TWO REGIONS Josh: In terms of looking at two regions in the same panel of lines, I might be looking to figure out how often two regions are or are not in LD with each other in the breeding program. Perhaps that would be helpful in picking parents that are otherwise equal by phenotype but offer different genetics. Here's a screen grab from excel that clarifies what I'm thinking about when I say I want to define windows in a region and look at how the haplotypes fall out. Windows are based on a 'window size' determined by the user and split evenly across the region. Or perhaps if you want to get really fancy, the tool could determine the windows using LD blocks in the sequence data. It would also be helpful to look at a whole chromosome partitioned into bins or windows and for each window have a bar that indicates the the number of haplotypes in that window. So the whole chromosome looks something like a sideways bar chart. Damien did this for me once (see following images) using sequence data he has for his QTL deployment work, and a greedy algorithm that called groups as different even if they were only different due to hets/missing data/only one SNP. Dima's clustering is much better. But you may find it useful to chat with him as he's thought a little about this before. This view would also require a more automated script to determine the ideal number of groups in each window that makes sense rather than having the user set it for every window. Perhaps that's easier after filtering for missing data and hets. For my purposes, I would look at the whole chromosome or the whole genome of a set of selected parents to get an idea of the haplotypic diversity. I might see a region where genetic variation is low, and investigate further. Maybe there's a trait marker there. But maybe it's drift. So I may choose a few additional parents that balance the haplotypic diversity in the region better to avoid loss of genetic diversity due to drift. I could imagine the bars could be color coded to indicate the frequency of each haplotype in a window something like this:
	Export	Be able to export datasets that have filtered on; export alleles and haplotypes	yes

LikeBe the first to like this

No labels
Edit Labels

Write a comment…

Powered by a free Atlassian Confluence Open Source Project License granted to Genomic and Open Source Breeding Informatics Initiative. Evaluate Confluence today.

Atlassian