Indel Management

Target release

3.0

Epic

https://gobiiproject.atlassian.net/browse/GE-326

https://ebsproject.atlassian.net/browse/SM-419

Document status

DRAFT

Document owner

@Yaw Nti-Addae

Designer

@Joshua Lamos-Sweeney

Tech lead

@Joshua Lamos-Sweeney

Technical writers

 

QA

 

Objective

Store INDELs from VCF, hapmaps and InterTek formats for 2 and 4 letter nucleotide data types

@Joshua Lamos-Sweeney @Yaw Nti-Addae for indels of up to what size, or doesn’t that matter for the look-up table solution?

Requirements

 

Requirement

User Story

Importance

Jira Issue

Release

Requirement

User Story

Importance

Jira Issue

Release

1

Place INDELs with Missing

HIGH

 

2.2.3

2

 Store INDELs from VCF

 

 Medium

 

 3.0

3

Manage + and -

+ and - should be stored as is and extracted as is

test

 

 

4

Store INDELs from all other formats

 

high

 

3.0

5

Extract data with INDELs

allow user to specify if they want to output INDELs, if not, then all INDELs are replaced with N/N or NN

medium

 

3.0

6

Web service extract

Modify web service calls to handle INDELs. allow user to specify if they want to output INDELs, if not, then all INDELs are replaced with N/N or NN

medium

 

3.0

User interaction and design

 

Open Questions

Question

Answer

Date Answered

Question

Answer

Date Answered

Need INDEL definitions from Liz

draft definitions here

 

 

Sample InterTek files with INDELs

Note, the exact lay-out of the Intertek files could have changed (ie coordinates of first datapoint) but the genotype data show examples of how the indels could look

 

Sample VCF and other file formats with INDELs

 

 

Downstream Pipelines

Pipeline

System/Tool

Notes

Pipeline

System/Tool

Notes

QC

KDcompute

 

DArTview

DArTview

 

Extract UI

GDM

Add separator appropriately (standard is “/”)

BrAPI extracts

GDM

Add separator appropriately (standard is “/”)

Flapjack bytes