To compare the performance of prokaryotic taxonomy classifiers using curated 16S full-length rRNA sequences

Comput Biol Med. 2022 Jun:145:105416. doi: 10.1016/j.compbiomed.2022.105416. Epub 2022 Mar 17.

Abstract

Background: Taxonomic assignment is a vital step in the analytic pipeline of bacterial 16S ribosomal RNA (rRNA) sequencing. Over the past decade, most research in this field used next-generation sequencing technology to target V3∼V4 regions to analyze bacterial composition. However, focusing on only one or two hypervariable regions limited the taxonomic resolution to the species level. In recent years, third-generation sequencing technology has allowed researchers to easily access full-length prokaryotic 16S sequences and presented an opportunity to attain greater taxonomic depth. However, the accuracy of current taxonomic classifiers in analyzing 16S full-length sequence analysis remains unclear.

Objective: The purpose of this study is to compare the accuracy of several widely-used 16S sequence classifiers and to indicate the most suitable 16S training dataset for each classifier.

Methods: Both curated 16S full-length sequences and cross-validation datasets were used to validate the performance of seven classifiers, including QIIME2, mothur, SINTAX, SPINGO, Ribosomal Database Project (RDP), IDTAXA, and Kraken2. Different sequence training datasets, such as SILVA, Greengenes, and RDP, were used to train the classification models.

Results: The accuracy of each classifier to the species levels were illustrated. According to the experimental results, using RDP sequences as the training data, SINTAX and SPINGO provided the highest accuracy, and were recommended for the task of classifying prokaryotic 16S full-length rRNA sequences.

Conclusion: The performance of the classifiers was affected by sequence training datasets. Therefore, different classifiers should use the most suitable 16S training data to improve the accuracy and taxonomy resolution in the taxonomic assignment.

Keywords: 16S full-Length; Metagenomics; Sequence classifier; Taxonomic assignment; Third generation sequencing.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bacteria* / genetics
  • High-Throughput Nucleotide Sequencing*
  • Phylogeny
  • RNA, Ribosomal, 16S / genetics

Substances

  • RNA, Ribosomal, 16S