Marker Wizard

Definitions

Use the Marker Wizard to load markers, associated metadata (for example: sequence, primers, probes, assay type), and marker mapsets. When you are loading marker metadata, you do not need to have a project defined first; you can preload your markers and metadata ahead of any project. Then, when you create your project, you only need to load the marker names.

Field Descriptions

Marker Metadata

Primary fields

The information in these fields can only be added during the initial upload of the data. The data in these fields cannot be updated or added to after the initial load of the markers to the database. Some of the fields in this table are required and are marked with an asterisk ( * ).

  • name: A required field for the marker name. It must be associated with a platform. The marker name must be unique within the platform. Refer to the Platforms definition.
  • ref: An optional field for the reference allele for the marker. 
  • alts: An optional field for the alternate allele(s) for the marker. The alt alleles can be separated by a number of characters so the following entries would all be allowed; C ; C/G;  C\G; C|G; C//G; C,G; C G. The following will not give the expected results; CG: /C/G: /C: C/.
  • sequence: An optional field used to store the underlying sequence that the marker primers and probes hybridize to. It is also called a context sequence or a sequence footprint.
  • reference_name: The name of the reference genome that a marker was called against. This is relevant for markers where the name is a concatenation of the chromosome name and position. In these cases, the marker name is based on a specific reference genome. The reference genome must first be created using the Define → References tool in the Loader. Then the exact text (case-sensitive) of the Reference Name must be entered in this column.
  • strand_name: This is the strand on which the marker is designed. This is a CV term to ensure strand names are consistent. For example: TOP, BOTT, forward, +, or marker_prop. The exact CVs available for this field can be obtained by exporting the list of terms from the Controlled Vocabulary tool.

Properties

The information in all of these fields is optional; it is possible to add new properties data or to update existing properties data at any time.

  • clone_id: This is used for DArT markers only = MarkerName for DArT-silico and DArT-seq/SNP. 

  • clone_id_pos: This is used for DArT markers only; SNP position in the sequence.

  • gene_annotation: For example, transcription factor.
  • gene_id: This is the gene from which the marker was designed. For example: Cicer_ein2.
  • genome_build: This is on what the marker was designed.
  • marker_dom: This is used to note if a marker is dominant. This could be done by using various conventions, e.g. by putting "D" or "yes" or "1" for dominant and leaving all other markers blank for this field or by writing "dominant" if dominant, "co-dominant" if known to be co-dominant, or blank or "unknown" if unknown. There is currently no controlled vocabulary to use in these fields.
  • polymorphism_type: This describes the underlying polymorphism, not the marker assay used to interrogate the polymorphism. For example: SNP, SSR, or indel.
  • primer_forw1: This is the first forward primer. Primers and probes used have generic names. The crop community should decide how they should be used.
  • primer_forw2: This is the second forward primer.
  • primer_rev1: This is the first reverse primer.
  • primer_rev2: This is the second reverse primer.
  • probe1: This is the first probe.
  • probe2: This is the second probe.
  • source: This is the source that describes how the marker was discovered. For example: a manuscript or journal e.g.: sequencing of candidate genes (Gujaria et al. 2011).
  • strand_data_read: This is the strand that is used to read out the genotyping data results. For example: in TOP or BOTT format, or F (forward). 
  • synonym: This is another name the marker is know by.
  • typeofrefallele_alleleorder: This is used to describe the order of the allele e.g. due to phasing, TOP, or first allele is the positive trait.


 Fields marked with " * " are mandatory


Linkage Group

  • name: A required field for the linkage group or chromosome name in which the marker is mapped. It can be a number, or, for example, Chromosome 1, Chromosome 2, ..., ChromosomeN, LG01, LG02, etc.
  • start: An optional field describing the beginning position of the chromosome or linkage group. It is usually 0 (zero), but can be a negative value for some genetic maps, and can have decimal places.
  • stop: An optional field describing the end position of the chromosome or linkage group. It can have several decimal places.

Marker - Linkage Group (Position of Marker on Linkage Group)

  • linkage_group_name: A required field for the linkage group name in which the marker is mapped. It is the same field as in the linkage group table, and connects Linkage Group and Marker tables.
  • marker_name: A required field for the marker name and has to match to marker name in the marker table.
  • start: A required field for the start position of the marker on the linkage group. Note, the start and stop position is the same for SNPs (with on base positions), but can be different for indels. The start of an indel is the nucleotide before the deletion. The start position can be a negative value and have decimal places.
  • stop: A required field for the the stop position of the marker on the linkage group. It can be the same as the start position. The start position can be a negative value and have decimal places.

Loading Marker Metadata

  1. When loading marker metadata, the platform that the markers belong to need to already be established. Define your platform using the Platforms form. If your metadata also contains map data, first create your mapset in Create Mapset. For additional information, refer to Mapsets.
  2. Select Wizards | Marker Wizard.
  3. The first form in the marker wizard has several drop-down menu options for selection of entities in which to load your marker data, most of which are not used for loading marker metadata; you only need to select the platform. If your metadata also contains map information, select the mapset also.
  4. Select file to load. For additional information, refer to Selecting the file to loadbelow.


Selecting the file to load

  1. For local files, click Browse to browse to your file, or drag and drop the file into the file list box. If you select the wrong file, check the checkbox next to the file, then click Remove Selected File(s)
  2. For large files, load them from the files folder in your crop server environment. Create a new folder under the files folder, then copy the folder name to the Remote Path field. Give a unique folder name for each data file (avoid special characters in folder name. Underscore ( _ ) and hyphen ( - ) are allowed.


  1. When loading many files with the exact same format, create a template at the end of the wizard process, then save it for future use. Use templates with caution; they do not key off matching field names, but do key off row and column positions. Ensure that when you use a template, your metadata fields and data are in the exact same columns as when your template was made.
  2. Select the file format from the File Format drop-down menu.
  3. Click Preview Data
  4. The top left-hand section of your file is now viewable in the preview section. Expand the columns to view the headers. 
  5. Select the header position from the Header Position drop-down menu. For all mapset, sample and marker files, the headers must be in the TOP position. 
  6. Select the Field header coordinate by clicking the row of the column headers in the file preview. The row number is returned in the Field header coordinate field. The row count starts at 0.
  7. Click Next to map data to the database fields.

Mapping Marker Metadata

  1. Metadata for the markers are loaded to the first two tables. The first table is called 'Marker Information' and contains key fields, but only name (Marker name) is required. The Property table can be used to store extra property fields. The user can add their own property fields in Define|Controlled Vocabulary.  Drag and drop your marker name from Data file fields to name in the Marker Information table. Drag and drop your other input file fields to the relevant fields in the marker tables. 
  2. Drag and drop fields to the third table, DS Marker Information ONLY if your file contains genotyping data and the markers will be associated with the dataset (DS). 
  3. You can parse field entries using standard separators. For example: a field entry [A/G] can be parsed out into ref using a start position of [ and end position of /, and alt using a start position of / and an end position of ]. You can also concatenate two fields together by dragging and dropping two fields from Data file fields into the Header field.




  4. Click Next to map data to any mapset fields.
  5. Click Finish to load the file to the database.
  6. You will receive an email confirmation that says SUCCESS or FAIL ; a fail message means data has failed validation.

Loading Markers to a Mapset


  1. When loading markers to a mapset, the mapset and the platform that the markers belong to need to already be established. Create your Mapset in Create Mapset and the Platform in Define Platform
  2. Select Wizards | Marker Wizard.
  3. The first form in the marker wizard has several drop-down menu options for selection of entities in which to load your marker data, most of which is not used for loading markers to mapsets; you only need to select the pre-defined platform and mapset.



  4. Select the file to load.

Selecting the file to load - see previous section

Mapping Markers to Mapsets

  1. Drag and drop your marker name from Data file fields to name in the Marker Information table. You do not need to complete any other fields in these tables to load a map. However, if this is the first time loading this marker list, ensure the key fields are loaded, as these cannot currently be updated. 



  2. Click Next. The next page contains the Map Information for Linkage Group and Marker - Linkage Group, which contain fields for the position of the marker on the linkage group.
  3. For the Linkage Group table, from Data file fields, drag and drop the linkage group name to name, and the start and stop positions for the entire linkage group, if available. 
  4. For the Marker - Linkage Group table, from Data file fields, drag and drop the marker_name, linkage_group_name and the start and stop positions for the marker on the linkage group. Marker start and stop positions are the same for SNPs, since we are using on-base coordinates, however, for indels, CNVs, MNPs, and rearrangements, start and stop positions differ.



  5. Click Finish to load the file into the database. 
  6. You are asked if you want to save a template to enable future loading of files with the same format. Note that the template uses row and column positions to load data, not field headers, so ensure files loaded with the template have exactly the same order of fields in the file.