MicroBIGG-E report

MicroBIGG-E record accession, organism, location, and biosample information

MicroBIGG-E report

MicroBIGG-E record accession, organism, location, and biosample information

The downloaded MicroBIGG-E package contains a MicroBIGG-E data report in JSON lines format at the following location in the file:

ncbi_dataset/data/data_report.jsonl

Each line of the MicroBIGG-E data report file is a hierarchical JSON object that represents a single MicroBIGG-E record. The schema of the MicroBIGG-E record is defined in the tables below where each row describes a single field in the report or a sub-structure, which is a collection of fields. The outermost structure of the report is MicroBiggeReport.

Table fields that include a Table Field Mnemonic can be used with the dataformat command-line tool's --fields option. Refer to the dataformat CLI tool reference to see how you can use this tool to transform MicroBIGG-E data reports from JSON Lines to tabular formats.

Sample report

{
  "amrFinderPlus": {
    "dbVersion": "2020-01-06.1",
    "type": "COMBINED",
    "version": "3.6.7"
  },
  "amrMethod": "BLASTP",
  "biosample": {
    "accession": "SAMN07179453",
    "assembly": "GCA_009287105.1",
    "geographicOrigin": "United Kingdom: United Kingdom",
    "source": "human",
    "type": "clinical"
  },
  "class": "COPPER/SILVER",
  "closestReferenceSequenceComparison": {
    "accession": "SPD96882.1",
    "alignLength": 491,
    "name": "copper/silver sensor histidine kinase SilS",
    "percentCoverage": 100,
    "percentIdentical": 98.57
  },
  "element": {
    "length": 493,
    "name": "copper/silver sensor histidine kinase SilS",
    "referenceLength": 491,
    "symbol": "silS"
  },
  "location": {
    "accessionVersion": "AAMJFE010000009.1",
    "range": [
      {
        "begin": "3794",
        "end": "5275",
        "orientation": "minus"
      }
    ]
  },
  "readToAssemblyCoverage": {
    "assembly": 52,
    "contig": 55,
    "ratio": 1.05769
  },
  "subclass": "COPPER/SILVER",
  "subtype": "METAL",
  "targetAcc": "PDT000214120.2",
  "taxonomy": {
    "group": "Salmonella enterica",
    "scientificName": "Salmonella enterica subsp. enterica serovar Rissen"
  },
  "type": "STRESS"
}

MicroBiggeReport Structure

FieldTable Field MnemonicTable Column NameTypeDescriptionExamples
targetAcctarget-accessionTarget accessionstring
elementElement
locationSeqRangeSetThe range of the gene
typetypeTypestringAMR
STRESS
subtypesubtypeSubtypestringAMR
METAL
classclassClassstringGLYCOPEPTIDE
COPPER/SILVER
subclasssubclassSubclassstringVANCOMYCIN
COPPER/SILVER
amrMethodamr-methodAMR methodstringEXACTP
isPlusis-plusIs plusbool
closestReferenceSequenceComparisonClosestReference
taxonomyTaxonomy
biosampleBiosample
readToAssemblyCoverageReadToAssemblyCoverage
amrFinderPlusAmrFinderPlus
genesOnContig repeatedcoming sooncoming soonstring
genesOnIsolate repeatedcoming sooncoming soonstring

AmrFinderPlus Structure

FieldTable Field MnemonicTable Column NameTypeDescriptionExamples
versionamrfinderplus-versionAMRFinderPlus versionstring
typeamrfinderplus-typeAMRFinderPlus typestring
dbVersionamrfinderplus-db-versionAMRFinderPlus database versionstring

Biosample Structure

FieldTable Field MnemonicTable Column NameTypeDescriptionExamples
geographicOriginbiosample-geo-originBioSample geographic originstringDenmark
not determined
sourcebiosample-sourceBioSample sourcestring
typebiosample-typeBioSample typestringclinical
environmental/other
accessionbiosample-accessionBioSample accessionstringSAMN00808999
assemblybiosample-assemblyBioSample assembly accessionstringGCA_000395725.1
collectionDatebiosample-collection-dateBioSample collection datestring

ClosestReference Structure

FieldTable Field MnemonicTable Column NameTypeDescriptionExamples
accessionclosest-ref-accessionClosest reference accessionstring
nameclosest-ref-nameClosest reference namestring
percentCoverageclosest-ref-pct-coverageClosest reference percent coveragefloat
percentIdenticalclosest-ref-pct-identClosest reference percent identityfloat
alignLengthclosest-ref-align-lenClosest reference alignment lengthint32

Element Structure

FieldTable Field MnemonicTable Column NameTypeDescriptionExamples
symbolelem-symbolElement symbolstringvanS-A
copB
nameelem-nameElement namestringVanA-type vancomycin resistance histidine kinase VanS
copper/silver-translocating P-type ATPase CopB
lengthelem-lengthElement lengthint32
referenceLengthelem-ref-lengthElement reference lengthint32

ReadToAssemblyCoverage Structure

FieldTable Field MnemonicTable Column NameTypeDescriptionExamples
contigread-assm-coverage-contigRead-to-Assembly-Coverage contiguint32
assemblyread-assm-coverage-assemblyRead-to-Assembly-Coverage assemblyuint32
ratioread-assm-coverage-ratioRead-to-Assembly-Coverage ratiofloat

Taxonomy Structure

FieldTable Field MnemonicTable Column NameTypeDescriptionExamples
grouptax-groupTaxonomic groupstringEnterococcus faecium
scientificNametax-nameTaxonomic namestringEnterococcus faecium EnGen0172

BioProject Structure

A BioProject is a collection of biological data related to a single initiative, originating from a single organization or from a consortium. A BioProject record provides users a single place to find links to the diverse data types generated for that project. The record can be retrieved from NCBI BioProject

FieldTable Field MnemonicTable Column NameTypeDescriptionExamples
accessionaccessionAccessionstringBioProject accessionPRJEB35387
titletitleTitlestringTitle of the BioProject provided by the submitterSciurus carolinensis (grey squirrel) genome assembly, mSciCar1
parentAccessions repeatedparent-accessionsParent AccessionsstringBioProject accession containing multiple children BioProjects["PRJNA489243","PRJEB33226","PRJEB40665"]

BioProjectLineage Structure

FieldTable Field MnemonicTable Column NameTypeDescriptionExamples
bioprojects repeatedlineage-LineageBioProjectA BioProject is a collection of biological data related to a single initiative, originating from a single organization or from a consortium

Range Structure

A 1-based range on a sequence record.

FieldTable Field MnemonicTable Column NameTypeDescriptionExamples
beginstartStartuint64
endstopStopuint64
orientationorientationOrientationOrientation
ordercoming sooncoming soonuint32I don’t think this needs to be included in gene reports but it is currently thereso it needs to be available in the spec until/unless it gets removed from that report

SeqRangeSet Structure

FieldTable Field MnemonicTable Column NameTypeDescriptionExamples
accessionVersionaccessionSequence AccessionstringNCBI Accession.version of the sequence
range repeatedrange-RangeSeries of intervals on above accession_version

Orientation Enumeration

NameNumberDescription
none0
plus1
minus2

Scalar Value Types

Protocol buffers typeNotesC++PythonJavaGo
doubledoublefloatdoublefloat64
floatfloatfloatfloatfloat32
int32Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead.int32intintint32
int64Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead.int64int/longlongint64
uint32Uses variable-length encoding.uint32int/longintuint32
uint64Uses variable-length encoding.uint64int/longlonguint64
sint32Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s.int32intintint32
sint64Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s.int64int/longlongint64
fixed32Always four bytes. More efficient than uint32 if values are often greater than 2^28.uint32intintuint32
fixed64Always eight bytes. More efficient than uint64 if values are often greater than 2^56.uint64int/longlonguint64
sfixed32Always four bytes.int32intintint32
sfixed64Always eight bytes.int64int/longlongint64
boolboolbooleanbooleanbool
stringA string must always contain UTF-8 encoded or 7-bit ASCII text.stringstr/unicodeStringstring
bytesMay contain any arbitrary sequence of bytes.stringstrByteString[]byte
Generated October 22, 2021