- Bioinformatics with Python Cookbook
- Tiago Antao
- 334字
- 2025-02-27 03:42:00
There's more...
There are many more databases at NCBI. You will probably want to check the Sequence Read Archive (SRA) database (previously known as Short Read Archive) if you are working with NGS data. The SNP database contains information on single-nucleotide polymorphisms (SNPs), whereas the protein database has protein sequences, and so on. A full list of databases in Entrez is linked in the See also section of this recipe.
Another database that you probably already know about with regard to NCBI is PubMed, which includes a list of scientific and medical citations, abstracts, and even full texts. You can also access it via Biopython. Furthermore, GenBank records often contain links to PubMed. For example, we can perform this on our previous record, as shown here:
from Bio import Medline
refs = rec.annotations['references']
for ref in refs:
if ref.pubmed_id != '':
print(ref.pubmed_id)
handle = Entrez.efetch(db='pubmed', id=[ref.pubmed_id], rettype='medline', retmode='text')
records = Medline.parse(handle)
for med_rec in records:
for k, v in med_rec.items():
print('%s: %s' % (k, v))
This will take all reference annotations, check whether they have a PubMed identifier, and then access the PubMed database to retrieve the records, parse them, and then print them.
The output per record is a Python dictionary. Note that there are many references to external databases on a typical GenBank record.
Of course, there are many other biological databases outside NCBI, such as Ensembl (http://www.ensembl.org) and UCSC Genome Bioinformatics (http://genome.ucsc.edu/). The support for many of these databases in Python will vary a lot.
An introductory recipe on biological databases would not be complete without at least a passing reference to BLAST. Basic local alignment search tool (BLAST) is an algorithm that assesses the similarity of sequences. NCBI provides a service that allows you to compare your sequence of interest against its own database. Of course, you can use have your local BLAST database instead of using NCBI's service. Biopython provides extensive support for this, but as this is too introductory, I will just refer you to the Biopython tutorial.