DNA Sample Wizard

Definitions

There two options to load the germplasm and DNA sample information using the DNA Sample wizard.

Option 1: to load Germplasm and DNA sample information and submit the job together
Option 2: to load Germplasm and DNA sample information separately

It is recommended to use the option 2 because it helps to identify any problem (if exists) in the germplasm file or/and DNA Sample file. Once the germplasm is successfully loaded, then it is easy to load the DNA sample information and submit it. Therefore, the instructions for the option 2 are described here.

Field Descriptions

Germplasm Metadata

Primary fields

*name: The germplasm name. For example, the name most commonly used, or the default name. The germplasm table has a one to many relationship with the dnasample table. For example, many dnasamples can be associated with a single germplasm name.
*external_code: The code used in adjacent germplasm or sample tracking database systems. For example, this could be GID, PlotID, or SampleID. Each germplasm name must have a unique external_code.
species_name: The species name. This is a CV term, must be added in the CV tables first to maintain naming consistency and it is case sensitive. Therefore, the species_name in the germplasm template should match exactly with the CV table of the germplasm_species.
type_name: The type or generation of germplasm. For example, accession, inbred_line, f1_hybrid, f2, f3, f4, f5, etc. It is also case sensitive. Therefore, the type_name in the germplasm template should match exactly with the CV table of the germplasm_type.

Properties (optional)

germplasm_heterotic_group: The germplasm groups within species. For example, NSS, SSS, A, or B for maize.
germplasm_id: A higher level of ID. For example, MGID.
germplasm_subsp: Sub species grouping of germplasm. This could be different for each crop, but, for example, would be dent, flint, sweet, or pop for maize, indica, and japonica for rice, breed wheat, durum wheat for wheat etc.
par1: The germplasm name for parent 1 of the germplasm (in a biparental cross this would be the female).
par2: The germplasm name for parent 2 of the germplasm (in a biparental cross this would be the male).
par3: The germplasm name for parent 3 of the germplasm.
par4: The germplasm name for parent 4 of the germplasm.
pedigree: The pedigree for the germplasm name.
seed_source_id: Seed source ID for the germplasm.

DNASample Metadata

Primary fields

*name: The name of the sample that gets sent to the lab for processing. We recognize that there are multiple levels of samples that can be tracked by a lab or LIMS system (batches/bags/sub samples, etc.), but for our purposes, we assume that the sample is in a plate or in tubes ready for processing in the laboratory so that the allele data can be connected back at the sample level within any project. Note: A unique sample in the GOBII system is defined by the unique combination of the project_name, dnasample_name, and dnasample_number so the dnasample_name does not need to be unique within a project or across projects. For this reason, the dnasample name in a project can also be the same name as the germplasm name. This is often the case for legacy data existing before sample tracking or LIMS systems were in place.
platename: The plate name that the sample is in. This can be a number (1,2,3,4, etc.), or a name given by the lab.
*num: Numerical order of the sample within a project. For example, 1-96 for a 96 well plate. Each sample needs to have a unique number within a project, unless the sample names are each unique within the project. Even in this case, it is always good practice to assign consecutive sample numbers to the samples in a project for ease of post-processing sorting.
well_col: The plate column coordinates for the sample. For example, 1-12 for a 96 well plate.
well_row: The plate row coordinates for the sample. For example, A-H for a 96 well plate.

Properties (optional)

ref_sample: A standard "Reference" sample against which all other germplasms are compared. This is also called a "gold standard" line. As this is not a CV term, the crop community needs to decide on standards for naming ref samples.
sample_group: The grouping of germplasm that has the utility to the breeder for analysis purposes. For example, a population and its parents that need to be grouped for data analysis.
sample_group_cycle: The cycle of germplasm grouping. For example, different generations of a population and its parents that allow for further grouping withing the germplasm group.
sample_parent_prop: This can be used to describe the type of parent, for example, female/male or DP/RP (donor/recurrent parent)
sample_type: The type of tissue sampled. For example, leaf, seed, bulk seed, or bulk plant. You can map the term to Plant Ontology (PO) if available. For example: if leaf sample is taken, it could be mapped to PO term leaf with its identifier PO:0025034 or leaf_PO:0025034.
trial_name: The trial name for the field experiment that the sample is coming from, or fieldbook.

DNArun Metadata

Primary fields

*name: The name of the sample when it is returned from the vendor and is associated with the genotyping data. This can be the same as the name sent to the vendor, or it could have been concatenated with a vendor ID

Properties (optional)

barcode: A barcode assigned to a sample. For example, for sequencing or other genotyping where samples are pooled.

"*" Mandatory fields

Loading Germplasm Metadata

Select Wizards | DNA Sample Wizard.
Select PI, Project, and Experiment.
Select file to load. For additional information, refer to Selecting the file to load.

Selecting the germplasm file to load

For local files, click Browse to browse to your file, or drag and drop the file into the file list box. If you select the wrong file, check the checkbox next to the file, then click Remove Selected File(s).
For large files, load them from the files folder in your crop server environment. Create a new folder under the files folder, then copy the folder name to the Remote Path field.
If you are loading many files with the exact same format, create a template at the end of the wizard process, then save it for future use. Use templates with caution; they do not key off matching field names, but do key off row and column positions. Ensure that when you use a template, your metadata fields and data are in the exact same columns as when your template was made.
Select the file format from the File Format drop-down menu. .txt and .csv files are supported for mapset files.
Click Preview Data. For local files, you must log into your crop server. For remote files already placed in the crop server file folder, you already are logged in and do not need to log in again.
The top left hand section of your file is now viewable in the preview section. Expand the columns to view the headers.
Select the header position from the Header Position drop-down menu. For all mapset files, the headers must be in the TOP position.
Select the Field header coordinate by clicking the row of the column headers in the file preview. The row number is returned in the Field header coordinate field.
Click Next to map data to the database fields. The row count starts at 0 as shown in the screenshot above.

Mapping Germplasm Metadata

The first two tables are mainly for mapping germplasm metadata. Mandatory fields are highlighted with bluish green (cyan) color. The left box shows the terms from your data file.
Drag and drop your germplasm name and germplasm external code from Data file fields to name and external_code respectively in the Germplasm Information table.
If you have information on species_name and type_name of your germplasm, you can drag and drop the information in the same way as shown in the screenshot below:

Note that both germplasm species_name and germplasm type_name are CV terms. Therefore, please make sure that the information provided for these two fields in your data files are same as described in the CV table.
Use the Property table below Germplam Information table to map the terms related to germpalm properties or metadata.
Drag and drop your additional germplasm fields from Data file fields into the Property field and click Next.
Skip mapping terms to DNA Sample Information table and just click Next.
Click Finish to load the germplasm file to the database. You may need to log into the server to submit the file.

Loading DNA Sample Metadata

Select Wizards | DNA Sample Wizard.
Select PI, Project, and Experiment.
Select file to load. For additional information, refer to Selecting the file to load.

Selecting the DNA sample file to load

For local files, click Browse to browse to your file, or drag and drop the file into the file list box. If you select the wrong file, check the checkbox next to the file, then click Remove Selected File(s).
For large files, load them from the files folder in your crop server environment. Create a new folder under the files folder, then copy the folder name to the Remote Path field.
If you are loading many files with the exact same format, create a template at the end of the wizard process, then save it for future use. Use templates with caution; they do not key off matching field names, but do key off row and column positions. Ensure that when you use a template, your metadata fields and data are in the exact same columns as when your template was made.
Select the file format from the File Format drop-down menu. .txt and .csv files are supported for mapset files.
Click Preview Data. For local files, you must log into your crop server. For remote files already placed in the crop server file folder, you already are logged in and do not need to log in again.
The top left hand section of your file is now viewable in the preview section. Expand the columns to view the headers.
Select the header position from the Header Position drop-down menu. For all mapset files, the headers must be in the TOP position.
Select the Field header coordinate by clicking the row of the column headers in the file preview. The row number is returned in the Field header coordinate field. The row count starts at 0.
Click Next to map data to the database fields as shown in the screenshot below:

Mapping DNA Sample Metadata

Click Next without mapping any terms related to Germplasm and go to DNAsample Information table directly.
The mandatory fields for both DNASample Informationa and DNArun/DS_DNArun Information are highlited with bluish green (cyan) color. The left box shows the terms from your data file.
Drag and drop your dnasample_name, germplasm_external code, and dnasample_number from Data file fields to name, external_code, number, respectively in the DNAsample Information table.
If you have information on dnasample_platename, dnasample_well_col, dnasample_well_row are available in your data file, you can drag and drop the information in the same way as shown in the screenshot below:
Drag and drop your DNArun_name, dnasample_name, dnasample_number from Data file fields to name, dnasample_name, number, respectively in the DNArun/DS_DNArun Information table.
If you have information on dnasample_platename, dnasample_well_col, dna_well_row are available in your data file, you can drag and drop the information in the same way as shown in the screenshot above.
Use the Property table which is given below DNAsample Information table and DNArun/DS_DNArun Information table to map the terms related to DNA sample and DNArun properties .
Drag and drop your additional germplasm fields from Data file fields into the Property field and click Next.
Skip mapping terms to DNA Sample Information table and just click Next.
Click Finish to load the germplasm file to the database. You may need to log into the server to submit the file.