BankIt Submission Help: Protein FASTA

The format of the protein FASTA file is similar to the format of the nucleotide FASTA file.

Like the nucleotide FASTA file, the protein FASTA file contains a SequenceID followed by the data for the sequence but it does not include organism name or any other source modifiers.

For the protein FASTA definition line, start with a > followed by the Sequence_ID of the nucleotide sequence that translates to the protein sequence.

Use the same Sequence_ID for the protein FASTA you used for its corresponding sequence in the nucleotide FASTA file.

There must NOT be a space between the > and the Sequence_ID

There must be a hard return between the >SequenceID and the actual protein sequence.

Format of a protein FASTA definition line showing placement of spaces and hard returns

Correct IUPAC codes for amino acids can be found in the GenBank Submissions Handbook

Sample Protein FASTA
>Seq1
LYLIFGAWAGMVGTALSLLIRAELGQPGTLLGDDQIYNVIVTAHAFVMIFFMVMPIMIGGFGNWLVPLMI
GAPDMAFPRMNNMSFWLLPPSFLLLLASSTVEAGAGTGWTVYPPLAGNLAHAGASVDLAIFSLHLAGVSS
ILGAINFITTAINMKPPTLSQYQTPLFVWSVLITAVLLLLSLPVLAAGITMLLTDRNLNTTFFDPAGGGD
PVLYQHLFWFFGHPEVYILIL

>Seq2 VGTALXLLIRAELXQPGALLGDDQIYNVVVTAHAFVMIFFMVMPIMIGGFGNWLVPLMIGAPDMAFPRMN NMSFWLLPPSFLLLMASSTVEAGAGTGWTVYPPLAGNLAHAGASVDLAIFSLHLAGISSILGAINFITTA INMKPPALSQYQTPLFVWSVLITAVLLLLSLPVLAAGITMLLTDRNLNTTFFDPAGGGDPVLYQHLFWFF GHPEVYILIL
Sample Protein FASTA File
sample file

For barcode submissions, one has the option of providing a file of protein sequences in FASTA format. This protein FASTA file is not required for Barcode submissions.