Indel Management

Target release	v3.0
Epic	https://gobiiproject.atlassian.net/browse/GE-326
Document status	DRAFT
Document owner	Yaw Nti-Addae
Designer	Joshua Lamos-Sweeney
Tech lead	Joshua Lamos-Sweeney
Technical writers
QA

Objective

Store INDELs from VCF, hapmaps and InterTek formats for 2 and 4 letter nucleotide data types

Joshua Lamos-Sweeney Yaw Nti-Addae for indels of up to what size, or doesn’t that matter for the look-up table solution?

Requirements

	Requirement	User Story	Importance	Release
1	Place INDELs with Missing		HIGH	2.2.3
2	Store INDELs from VCF		MEDIUM	3.0
3	Manage + and -	+ and - should be stored as is and extracted as is	TEST
4	Store INDELs from all other formats		HIGH	3.0
5	Extract data with INDELs	allow user to specify if they want to output INDELs, if not, then all INDELs are replaced with N/N or NN	MEDIUM	3.0
6	Web service extract	Modify web service calls to handle INDELs. allow user to specify if they want to output INDELs, if not, then all INDELs are replaced with N/N or NN	MEDIUM	3.0

User interaction and design

Open Questions

Question

Answer

Date Answered

Need INDEL definitions from Liz

draft definitions here

Sample InterTek files with INDELs

Note, the exact lay-out of the Intertek files could have changed (ie coordinates of first datapoint) but the genotype data show examples of how the indels could look

Sample VCF and other file formats with INDELs

Downstream Pipelines

Pipeline	System/Tool	Notes
QC	KDcompute
DArTview	DArTview
Extract UI	GDM	Add separator appropriately (standard is “/”)
BrAPI extracts	GDM	Add separator appropriately (standard is “/”)
Flapjack bytes