DNA Sample Wizard

Definitions

There two options to load the germplasm and DNA sample information using the DNA Sample wizard.

Option 1: to load Germplasm and DNA sample information and submit the job together
Option 2: to load Germplasm and DNA sample information separately

It is recommended to use option 2 because it helps to identify any problem (if it exists) in the germplasm file or/and DNA Sample file. Once the germplasm is successfully loaded, then it is easy to load the DNA sample information and submit it. Therefore, the instructions for option 2 are described here.

Field Descriptions

Germplasm Metadata

Primary fields

name: A required field describing the germplasm name. For example: the name most commonly used, or the default name. The germplasm table has a one to many relationship with the DNA sample table. For example, many dnasamples can be associated with a single germplasm name.
dnaexternal_code: A required code to describe the unit of material from which the sample was generated. This is most likely a PlotID or a PLantID. The code should be meaningful in an adjacent germplasm or sample tracking database systems. Each germplasm name can have several external_codes, but each external code can only be linked to one germplasm_name.
species_name: An optional field for species name. This is a CV term, and must be added in Define|Controlled Vocabulary to maintain naming consistency. The species_name in the input file should match exactly with the CV table entry in germplasm_species.
type_name: An optional field for type, or generation, of the germplasm. For example: the germplasm could be an accession, inbred_line, f1_hybrid, f2, f3, f4, f5, etc. This is a CV term, and must be added in Define|Controlled Vocabulary to maintain naming consistency. Therefore, the type_name in the input file should match exactly with the CV table of the germplasm_type.

Properties (all optional)

germplasm_heterotic_group: The germplasm groups within species. For example: NSS, SSS, A, or B for maize.
germplasm_id: A higher level of ID. For example: MGID.
germplasm_subsp: Sub species grouping of germplasm. This could be different for each crop, but for example, would be dent, flint, sweet, or pop for maize, indica, and japonica for rice, bread wheat, durum wheat for wheat, etc.
par1: The germplasm name for parent 1 of the germplasm (in a biparental cross this would be the female).
par2: The germplasm name for parent 2 of the germplasm (in a biparental cross this would be the male).
par3: The germplasm name for parent 3 of the germplasm.
par4: The germplasm name for parent 4 of the germplasm.
pedigree: The pedigree for the germplasm name.
seed_source_id: Seed source ID for the germplasm.

DNASample Metadata

Primary fields

name: A required field to describe the sample name. This is usually the name that gets sent to the lab for processing. Note: A unique sample (within a GDM Crop Database) is defined by the unique combination of the project_name, dnasample_name, and dnasample_number so the dnasample_name does not need to be unique within a project or across projects. The dnasample name in a project can also be the same name as the germplasm name. This is often the case for legacy data existing before sample tracking or LIMS systems were in place.
uuid: A required field for a unique ID that can be used to identify a sample or sub-samples generated from a plot or plant. We advocate the use of a Universal Unique Identifier which is a 128-bit number used to identify information in computer systems eg 123e4567-e89b-12d3-a456-426655440000. Such a uuid will help to identify and track samples generated from across multiple institutes and systems. As long identifiers may not be manageable by vendors, the sample name can be sent to the vendor, and the uuid maintained internally.
platename: An optional field describing the plate name that the sample is in. This can be a number (1,2,3,4, etc.), or a name given by the lab.
num: A required field describing the numerical order of the sample within a project. For example, 1-96 for a 96 well plate. The combination of dnasample name and num need to be unique within the project.
well_col: An optional field describing the plate column coordinates for the sample. For example, 1-12 for a 96 well plate.
well_row: An optional field describing the plate row coordinates for the sample. For example, A-H for a 96 well plate.

Properties (optional)

ref_sample: A standard "Reference" sample against which all other germplasms are compared. This is also called a "gold standard" line. As this is not a CV term, the crop community needs to decide on standards for naming ref samples.
sample_group: The grouping of germplasm that has the utility to the breeder for analysis purposes. For example: a population and its parents that need to be grouped for data analysis.
sample_group_cycle: The cycle of germplasm grouping. For example: different generations of a population and its parents that allow for further grouping withing the germplasm group.
sample_parent_prop: This can be used to describe the type of parent, for example: female/male or DP/RP (donor/recurrent parent)
sample_type: The type of tissue sampled. For example: leaf, seed, bulk seed, or bulk plant. You can map the term to Plant Ontology (PO) if available. For example: if leaf sample is taken, it could be mapped to PO term leaf with its identifier PO:0025034 or leaf_PO:0025034.
trial_name: The trial name for the field experiment that the sample is coming from, or fieldbook.

DNArun Metadata

Primary fields

name: A required field describing the name of the sample that is returned from the vendor and is associated with the genotyping data. This can be the same as the name sent to the vendor, or it could have been concatenated with a vendor ID. The translation between the name sent to the vendor (dnasample_name) and the name returned from the vendor (dnarun_name) will be provided by the vendor e.g. in a 'key' file.

Properties (optional)

barcode: A barcode assigned to a sample. For example: for sequencing or other genotyping where samples are pooled.

Loading Germplasm Metadata

Select Wizards | DNA Sample Wizard.
Select PI, Project, and Experiment.
Select file to load. For additional information, refer to the Selecting the germplasm file to load below.

Selecting the germplasm file to load

For local files, click Browse to browse your file, or drag and drop the file into the file list box. If you select the wrong file, check the checkbox next to the file, then click Remove Selected File(s).
For large files, load them from the files folder in your crop server environment. Create a new folder under the files folder, then copy the folder name to the Remote Path field.
If you are loading many files with the exact same format, create a template at the end of the wizard process, then save it for future use. Use templates with caution; they do not key off matching field names, but do key off row and column positions. Ensure that when you use a template, your metadata fields and data are in the exact same columns as when your template was made.
Select the file format from the File Format drop-down menu.
Click Preview Data. For local files, you must log into your crop server.
The top left-hand section of your file is now viewable in the preview section. Expand the columns to view the headers.
Select the header position from the Header Position drop-down menu. For all marker, mapset and sample files, the headers must be in the TOP position.
Select the Field header coordinate by clicking the row of the column headers in the file preview. The row number is returned in the Field header coordinate field (0 is the top row).
Click Next to map data to the database fields. The row count starts at 0 as shown in the screenshot above.

Mapping Germplasm Metadata

The first two tables are mainly for mapping germplasm metadata. Mandatory fields are highlighted with bluish green (cyan) color. The left box shows the terms from your data file.
Drag and drop your germplasm name and germplasm external code from Data file fields to name and external_code respectively in the Germplasm Information table.
If you have information on species_name and type_name of your germplasm, you can drag and drop the information in the same way as shown in the screenshot below:

Note that both germplasm species_name and germplasm type_name are CV terms. Therefore, make sure that the information provided for these two fields in your data files are as described in the CV table.
Use the Property table below Germplam Information table to map the terms related to germpalm properties or metadata.
Drag and drop your additional germplasm fields from Data file fields into the Property field and click Next.
Skip mapping terms to DNA Sample Information table and then click Next.
Click Finish to load the germplasm file to the database. You may need to log into the server to submit the file.

Loading DNA Sample Metadata

Select Wizards | DNA Sample Wizard.
Select PI, Project, and Experiment.
Select file to load. For additional information, refer to Selecting the file to load, above

Mapping DNA Sample Metadata

Click Next without mapping any terms related to Germplasm and go to DNAsample Information table directly.
The mandatory fields for both DNASample Information and DNArun/DS_DNArun Information are highlighted with bluish green (cyan) color. The left box shows the terms from your data file.
Drag and drop your dnasample_name, germplasm_external code, dnasample_UUID, and dnasample_number from Data file fields to name, external_code, number, respectively in the DNAsample Information table.
If you have information on dnasample_platename, dnasample_well_col, dnasample_well_row are available in your data file, you can drag and drop the information in the same way as shown in the screenshot below:
Drag and drop your DNArun_name, dnasample_name, dnasample_number from Data file fields to name, dnasample_name, number, respectively in the DNArun/DS_DNArun Information table.
If you have information on dnasample_platename, dnasample_well_col, dna_well_row are available in your data file, you can drag and drop the information in the same way as shown in the screenshot above.
Use the Property table which is given below DNAsample Information table and DNArun/DS_DNArun Information table to map the terms related to DNA sample and DNArun properties .
Drag and drop your additional germplasm fields from Data file fields into the Property field and click Next.
Skip mapping terms to DNA Sample Information table and just click Next.
Click Finish to load the germplasm file to the database. You may need to login to the server to submit the file.