AGP Validation

AGP file structure and content can be validated using the agp_validate program. The on-line version of agp_validate checks that the input AGP file conforms to the AGP format Specification, checks the file for internal consistency, and generates a report of component, gap, scaffold and object statistics. The command-line version of agp_validate performs all the same checks as the on-line version but also has options to perform additional checks by comparing the AGP to sequences in FASTA files (see below).

AGP validation on-line

The on-line AGP Validation form can be used to run agp_validate on an uploaded AGP file or on AGP text pasted into the form.

agp_validate command-line program

agp_validate is available by anonymous FTP. Copy the appropriate version for your platform, then uncompress the file, rename it to "agp_validate", and set the "execute" permission (see platform-specific details).

usage overview: agp_validate [-options] [FASTA files...] [AGP files...]

  • Run without any options agp_validate will perform a large number of validations on the input AGP files (see below), and will also generate a report of component, gap, scaffold and object statistics.
  • If component FASTA sequence files are provided, agp_validate will also check that component spans do not exceed the sequence length.
  • If the component sequences are available in GenBank, then agp_validate can perform additional checks using the sequence lengths, versions, and taxonomy ID retrieved from GenBank (-alt and -species options).
  • If FASTA sequences for the assembled objects are provided, agp_validate can also check that the sequences match what can be constructed from the AGP and the component sequences (-comp option).
  • Information on all the available options can be obtained by executing agp_validate with the -help option.

Validations performed by agp_validate

Error level violations reported include:

  • Incorrect number of columns: there should be 9 tab-separated columns.
  • Non-positive integers in the following columns:
    • 2: object_beg
    • 3: object_end
    • 4: part_number
    • 6b: gap_length
    • 7a: component_beg
    • 8a: component_end
  • object_end is less than the object_beg.
  • component_end is less than the component_beg.
  • The length of the span specified for the component (in column 7a and 8a) does not match the length of the span specified for the object (in column 2 and 3).
  • The length specified for the gap (in column 6b) does not match the length of the span specified for the object (in column 2 and 3).
  • Linkage=yes with a gap_type other than scaffold or repeat.
  • Object does not start with an object_beg coordinate of 1.
  • Object has ranges that are non-sequential and/or overlapping.
  • Object does not start with a part_number of 1.
  • Object has non-sequential lines and/or lines mixed with other objects.
  • Multiple objects with the same object name (column 1).
  • Component orientation of 0 or na used for a non-singleton scaffold.
  • Invalid terms or symbols in the following columns:
    • 5: component_type
    • 7b: gap_type
    • 8b: linkage
    • 9a: orientation

Warning level violations reported include:

  • Gap at the beginning or the end of an object.
  • Consecutive gap lines of the same type.
  • Overlapping spans used for a given component_id.
  • Non-draft component_id used more than once.
  • Non-draft component spans out of order.
  • Extra tab character at the end of the line.
  • Component type is not consistent with the line format.
  • Component type is not consistent with the component_id accession.

Additional errors and warnings reported when optional validations are invoked:

  • Invalid component_id. [-alt or -g option]
  • Component is not in GenBank. [-alt option]
  • component_id is ambiguous without an explicit version. [-alt option]
  • component_end is greater than the sequence length. [-alt option, or FASTA files provided]

Support Center

Last updated: 2017-11-01T20:27:36Z