Table 5.

Fields used to categorize information in Gene records.

Field nameDefinition [including field abbreviations]Examples
Name subcategory
Disease name or phenotype of mutantsDisease or phenotype associated with the record.
[DIS]
Find the genes that contribute to SCID.
SCID[dis]
Gene nameA symbol for the gene. Includes preferred symbols, aliases, and locus tags.
[SYM][SYMB][GN][GENE NAME]
Genes with a symbol starting with smt.
smt*[sym]
Preferred symbolThe preferred symbol for the gene, not including aliases or locus tags.
[PREF]
Genes with a preferred symbol starting with smt.
smt*[pref]
Gene full nameOnly the full name of the gene.
[GFN][GENEFULLNAME]
Find genes with a full gene name of tumor protein p53:
tumor protein p53[gene full name]
Gene/protein nameThe short or full name of the gene or any of its protein products (when applicable).
[TITL][TITLE][TF][TI][Protein]
Find genes that have the word kinase in GO annotation but do not have the word kinase in the name.
kinase[gene ontology] NOT kinase[gene/protein name]
Protein full nameOnly the full name of the protein products.
[PFN][PROTEINFULLNAME]
Find genes with a full protein name of glutathione S-transferase M1:
glutathione S-transferase M1[protein full name]
Location subcategory
Base positionBase position, relative to a genomic accession. This is supported only for the reference assembly. To specify the genomic accession, the chromosome and organism (or Taxonomic ID) must be included, or the chromosome accession itself. Unplaced/unlocalized scaffold accessions may also be queried by base position.
The query should define a range of at least 100 kb, and the range must be specified as a pair of integers. The query results will include genes that lie either partly or completely within the range.
[CHRPOS,CPOS,CPOSITION]
9606[taxid] AND 12[chromosome] AND 9100000:9200000[chrpos]
NC_000012[nucl_accn] AND 9100000:9200000[chrpos]
NW_004668236[nucl_accn] AND 900000:1000000[chrpos]
ChromosomeChromosome location of the gene. The value used is according to the convention of the source genome. In other words, if III is used, III but not 3 will be indexed in this field.
[CHRM][CHR][CHROMOSOME]
Retrieve records containing the word kinase, and the gene is located on chromosome III: kinase AND III[chr]
Retrieve records containing the words zinc and finger that are of human origin but not on chromosome 19: zinc finger NOT 19[chr] AND "Homo sapiens"[orgn]
Default map locationA map location in the units standard for the genome. For example, for human it is the cytogenetic band, for mouse it is the MGI map (centiMorgans). This is processed as a text field, so range queries are not implemented. For range queries, use Genome Data Viewer.Rat genes mapped to 18 q:
rat[orgn] AND 18q[default map location]
Sequence subcategory In Gene: This means searching by sequence identifier, not by the sequence itself, which is managed by BLAST.
Nucleotide accessionAn accession for a nucleotide sequence.
[NACC][NUCL_ACCN]
There are instances where the same accession is applied to both nucleotide and protein sequences. To restrict an accession to nucleotide, use this field. (Accession numbers beginning with BC are not in this category.)
BC052629[NACC]
Protein accessionAn accession for a protein sequence.
[PACC][PROT_ACCN]
There are instances where the same accession is applied to both nucleotide and protein sequences. To restrict an accession to protein, use this field. (Accession numbers beginning with three letters are not in this category.)
AAH52629[PACC]
Nucleotide or Protein accessionA sequence accession of any type.
[ACCN]
Find all the genes encoded in accession AE003828:
AE003828
Miscellaneous subcategory (alphabetical)
Assembly-specific gene annotationsFind records annotated on one assembly but not on another. [Assembly Name]
1.

Find cow genes annotated on UMD_3.1 but not on Btau_4.6.1:

Bos_taurus_UMD_3.1[assembly name] NOT Btau_4.6.1[assembly name] AND alive[property]
2. Find human genes annotated on GRCh38 but not on HuRef:
GRCh38[assembly name] NOT HuRef[assembly name] AND alive[property]
3. Find human genes on alternate GRCh38 assemblies but not on the primary assembly:
(txid9606[orgn] AND alt_ref_loci_*[assembly name]) NOT "primary assembly"[assembly name] AND alive[property]
Creation dateDate the record was created.
[cd][cdat][creation date]
Records containing the word xenopus created between February 5, 2004 and February 12, 2004:
2004/2/5:2004/2/12[cd] AND xenopus[orgn]
Date DiscontinuedThe date on which the record was discontinued
[DIS_DATE][DDAT][DISCONTINUED][DISDATE]
Records discontinued between January 1, 2006 and December 31, 2006: 2006/1/1:2006/12/31[disdate]
Domain NameConserved domain and protein family names.
[DOMAINNAME][DOM]
Retrieve records associated with the A2M family:
A2M[domainname]
EC/RN numberEnzyme commission identifier for a product of the gene. Indexed without the EC prefix.
[ECNO][EC]
Retrieve records where proteins have an E.C. number of 1.9.3.1: 1.9.3.1[ECNO]
Exon countThe number of distinct, non-overlapping RefSeq exons annotated for all RNA products of this gene interval, based on annotation in this priority: reference assembly first, alternate assembly second.
This field can be queried by either a single integer value or a range.
[XC][NUMEXONS]
Retrieve human records with one exon:
human[orgn] AND 1[exoncount]
Retrieve all records with a range of exons:
10:20[exoncount]
FilterFind records with a relationship to other data in Gene. For more examples of use of filters, see the Preview/Index section.Retrieve records of mouse kinase genes with expression data stored in GEO:
mouse[orgn] AND gene_geoprofiles[filter] AND kinase
Gene IDGene identifier.
This field can be queried by either a single integer value or a range.
[UID][ID][GeneID]
Many integer identifiers have overlapping number spaces. To find the gene record that corresponds to the human BRCA1 gene by GeneID, use this field:
672[GeneID]
Gene lengthGene length based on annotation in this priority: reference assembly first, alternate assembly second. If there are multiple placements, only on non-reference assemblies, then the longest value on non-reference assemblies is used.
This field can be queried by either a single integer value or a range.
[GL][GENELEN]
Retrieve all records with a gene span less than or equal to 5kb:
1:5000[genelength]
Gene OntologyGO terms applied to this gene AND the GO identifer as the integer. The terms include the component, function, and process categories.
[GO][GENE ONTOLOGY]
Rat genes with GO terms starting with “kinase signaling”
kinase signaling*[gene ontology] rat[orgn]
Any gene with the GO id of GO:0004872:
4872[GO]
GroupQuery terms to retrieve a set of genes with a specified relationship to another genePseudogenes related to the same functional gene with GeneID = 11727
"related functional gene 11727"[Group]
MIMIdentifier assigned to human genes and phenotypes by OMIM
[MIM]
Retrieve records that contain the MIM number 181510: 181510[MIM]
Modification dateLast date the record was modified.
[MODDATE][MDAT][LMOD][DATE][UPDATED][MD]
Retrieve records for genes from eubacterial genomes last modified after March 10, 2004:
eubacteria[orgn] AND 2004/3/10:2010/1/1[md]
Retrieve records from sea urchins modified in the last 30 days:
echinoidea[orgn]+AND+"last 30 days"[mdat]
OrganismScientific and common names of organism
[ORGN]
Find all records in Gene for the pig:
pig[organism]
PropertyAn attribute of a Gene record based on its content See Properties.
[prop][property]
Mouse records with transcript variants:
mouse[orgn] AND "has transcript variants"[property]
PubMed UIDPubMed id.
[PMID]
Many integer identifiers have overlapping number spaces. To find the gene record(s) that corresponds to a paper in PubMed from Gene, use this field:
12477932[PMID]
Taxonomy IDIdentifier for the species or strain in the NCBI taxonomy database. HINT: txid{value} also works, e.g., txid9606.
[TAXID][TID]
Find all records in Gene for the pig:
9823[taxid]
Alternatively:
txid9823
Text WordAny word in the record.
[TEXT][WORD][AB][TXT]
Retrieve records that contain “32” in a record that also contains threonine, serine, and kinase: serine AND threonine AND kinase AND 32[TEXT]

From: Gene Help: Integrated Access to Genes of Genomes in the Reference Sequence Collection

Cover of Gene Help
Gene Help [Internet].

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.