Skip to main page content Skip to main page content

BioC API for PubMed

Accessing PubMed in BioC format (click here for accessing PMC articles)

All the PubMed articles are available in the BioC format. This provides a large number of research articles for text mining and information retrieval research. BioC is a simple format designed for straightforward text processing. These articles are available in BioC XML or BioC JSON, in Unicode or ASCII.

If you use this resource, please cite:

  • Comeau DC, Wei CH, Dogan RI, and Lu Z. PMC text mining subset in BioC: about 3 million full text articles and growing, Bioinformatics, 2019

Instructions

https://0-www-ncbi-nlm-nih-gov.brum.beds.ac.uk/research/bionlp/RESTful/pubmed.cgi/BioC_[format]/[PMID]/[encoding]
The parameters are:
  • format: xml or json
  • PMID
  • encoding: unicode or ascii

Sample URL:
https://0-www-ncbi-nlm-nih-gov.brum.beds.ac.uk/research/bionlp/RESTful/pubmed.cgi/BioC_xml/17299597/unicode

Same article in ASCII:
https://0-www-ncbi-nlm-nih-gov.brum.beds.ac.uk/research/bionlp/RESTful/pubmed.cgi/BioC_xml/17299597/ascii
Obviously, no Unicode to ASCII translation is perfect. We have found this one useful.

JSON instead of XML:
https://0-www-ncbi-nlm-nih-gov.brum.beds.ac.uk/research/bionlp/RESTful/pubmed.cgi/BioC_json/17299597/unicode
BioC JSON follows the same structure as BioC XML.

More information

General information about BioC XML structure:
ftp://ftp.ncbi.nlm.nih.gov/pub/wilbur/BioC-PMC/BioC.dtd

Specific information about BioC PubMed:
ftp://ftp.ncbi.nlm.nih.gov/pub/wilbur/BioC-PMC/pubmed.key

Main BioC web page:
http://bioc.sourceforge.net

Caution

If you experience any problems, please share them with us: donald.comeau@nih.gov or zhiyong.lu@nih.gov.