Append Datasets

Target release	2.3
Epic	GDM-170 - Getting issue details... STATUS
Document status	DRAFT
Document owner	Deb Weigand
Designer
Tech lead
Technical writers
QA

Objective

To be able to append a dataset to an existing dataset so that extracts are associated with the expected dnarun IDs

Success metrics

Goal	Metric
To be access the dataset wizard from a dataset page in the data loader by having an append button (this can be underneath the dataset wizard button)	Be able to overide a dataset being loaded to an existing dataset by clicking an 'append' button so that the dataset wizard is made available
To be able to append a dataset to an existing dataset using the dataset wizard in one of two directions, where' directions' described markers or samples (dnaruns)	two datasets with common samples or markers loaded to a single dataset name.
To be able to receive appropriate error messages if samples or markers associated with the new matrix are not contained within the existing matrix	If matrices can not be appropriately appended (either due to samples or markers not being contained within the first martrix) receive error message "some samples or markers are not contained within the first dataset and have been rejected, or "some samples or markers were already present in the first dataset and have been rejected"
To be able to extract data so that the two datasets to a single dataset name loaded look like a single dataset with dnaruns and markers aligned	Common markers and samples across the 2 datasets should be aligned in the extract

Assumptions

A matching sample matches at: sample_name, sample_num, and dnarun_name combination

Milestones

Dashboard

Notification

Feature 1

Feature 2

Feature 3

Feature 4

iOS App

Android

Requirements

#

Requirement

User Story

Importance

Jira Issue

Notes

1

Need to be able to append a dataset to an existing dataset and have the dnaruns and markers align appropriately

For large sequencing experiments the data is returned in separate files, usually separated by chromosome. Sometimes these are too big to load in one go and each chromosome will be loaded separately - currently to different datasets within the same experiment. Even though the dnarun names are the same for these different datasets, they are given a separate dataset dnarun_name ID and when extracted the data are not aligned under a common dnarun_name/sample name. This can be done by appending a dataset to an existing dataset
Some vendors may only run 1 plate of samples at a time and return data sequentially. All plates would be run with the same set of markers. Plate one set of samples may be loaded to a dataset, and when plate 2 is returned the new data can be matched by markers and appending with the additional samples.
Due to lack of availability of marker inventory, vendors may not be able to run all requested markers at once. They could run all samples against available markers, and then at a later stage run the same samples against the remaining markers when they become available. The second dataset can be appended to the first dataset by matching by samples and appending with the additional markers.

High

2

Access dataset wizard form dataset page in loader UI even if a dataset has been loaded by clicking on an 'Append' button

User cannot currently add a dataset to an existing dataset as the dataset wizard is locked in the loader UI

HIGH

3

Extracted data needs to be aligned to the appropriate dnarun_names and markers

User interaction and design

append

Open Questions

Question	Answer	Date Answered
If there is overlap in both sample and marker dimensions, how do we know which dimension the user is trying to update?	Josh suggested having an append markers to samples, or append samples to markers buttons so it is clear

Out of Scope

We will not sort samples or markers to match the samples or markers in an existing dataset. Samples and markers need to be sorted in the same order that they were in in the first dataset that is being appended to.

We will not replace any existing data ie sample/marker combinations in the existing dataset. These data will be rejected with an appropriate error

Genomic Data Manager