MedGen FAQ

How can I find the names of disorders that are caused by a particular gene?

By submitting a query based on gene symbol.

If you enter a gene symbol in MedGen's query box, the result will incude all records with that term anywhere in the text. To limit the results to disorders thought to be affected by altered function of that gene, click on the link at the top of the page that reads something like See GENE SYMBOL in MedGen (2). The number in parentheses at the end of that phrase identifies the number of records in MedGen reported as being caused by altered function of the GENE SYMBOL (eg. CFTR).

By starting with the Gene database

Within the phenotype section of Gene, there is a list of names of disorders with links to MedGen. Or within a Gene record, follow the MedGen link in the Related Information section at the right.

When I search by a MIM number, why do I sometimes get multiple records?

There are two major data flows that manage relationships between MIM numbers and records in MedGen. One is the daily update provided by GTR- and ClinVar-related data flows from OMIM. The second is the semi-annual update from UMLS to MedGen. In the former data flow, the relationship of MedGen record to MIM number is 1:1. In the latter data flow the MIM number may be reported for more than one concept UID or CUI.

Why aren't all terms from SNOMED CT in MedGen?

MedGen includes terms and their identifiers from SNOMED CT based only on the semi-annual releases from UMLS. Thus MedGen may be up to 6 months out of date. MedGen also limits its scope to concepts of interest to Medical Genetics. Thus there are some SNOMED CT terms that are not included, no matter how long they have been established, because they are out of scope, e.g. immunologic factors.

How are references chosen for the Recent clinical studies section in MedGen?

The citations listed in the Recent clinical studies are not curated, but provided computationally by using the Clinical Queries tool maintained by PubMed. The query that is used is the preferred name of the record.

How are relationships between MedGen and PubMed computed?

The links between records in MedGen and PubMed are generated by a combination of curation and computation. For those that are computed, the preferred term in MedGen is used to query PubMed, either limiting to matches in the title+ abstract of the paper, or limiting to matches to articles, once indexed to MeSH terms, that are indexed to have a genetic component. When non informative terms are identified, they are added to a 'stop list' to prevent future false positives.

How can I extract a report of MedGen identifiers and their relationships to MIM numbers and HPO identifiers?

There are multiple ways to access these data.

MIM. If the starting point is the MIM number, this file on Gene's ftp site reports the MedGen concept identifiers that match the phenotype records. Not all MIM numbers have a corresponding record in MedGen; genes are out of scope. ftp://ftp.ncbi.nih.gov/gene/DATA/mim2gene_medgen

HPO. If the focus is data from HPO, then there are two files on MedGen's ftp site

The README files at both sites provide all the details.

How can I extract a report of MedGen identifiers and their relationships to other concepts (such as hierarchies)?

MedGen's ftp siteprovides a compressed file named MGREL.RRF.gz, or in the csv folder, a series of files (split to make them managable), named MGREL_number.csv. The fourth column, REL, includes the values PAR for parent, CHD for child, and SIB for sibling. These can be used in conjunction with the CUI1 and CUI2 values to construct hierarchies. The usage in MedGen for the REL column is consistent with that of UMLS.https://www.nlm.nih.gov/research/umls/knowledge_sources/metathesaurus/release/abbreviations.html

