Question

How to find find all human tRNA-sequences using Bio.Entrez?

0

Entering edit mode

8.7 years ago

natasha.sernova ★ 4.0k

Dear all,

Simple google search gave me the following:

http://gtrnadb.ucsc.edu/genomes/eukaryota/Hsapi19/hg19-tRNAs.fa

https://www.ncbi.nlm.nih.gov/genome/51

https://www.ncbi.nlm.nih.gov/gene/?term=tRNA%20AND%20human

But how can I do it programmically in Python and find all human tRNA-sequences using Bio.Entrez?

I’ve read biopython Cookbook. I found the following example for some Arabidopsis thaliana chromosomes.

17.2.2 Annotated Chromosomes Continuing from the previous example, let’s also show the tRNA genes. We’ll get their locations by parsing the GenBank files for the five Arabidopsis thaliana chromosomes. You’ll need to download these files from the NCBI FTP site ftp://ftp.ncbi.nlm.nih.gov/genomes/Arabidopsis_thaliana, and preserve the subdirectory names or edit the paths below:

from reportlab.lib.units import cm
from Bio import SeqIO
from Bio.Graphics import BasicChromosome

entries = [("Chr I", "CHR_I/NC_003070.gbk"),
       ("Chr II", "CHR_II/NC_003071.gbk"),
       ("Chr III", "CHR_III/NC_003074.gbk"),
       ("Chr IV", "CHR_IV/NC_003075.gbk"),
       ("Chr V", "CHR_V/NC_003076.gbk")]

max_len = 30432563 #Could compute this
telomere_length = 1000000 #For illustration

chr_diagram = BasicChromosome.Organism()
chr_diagram.page_size = (29.7*cm, 21*cm) #A4 landscape

for index, (name, filename) in enumerate(entries):
  record = SeqIO.read(filename,"genbank")
  length = len(record)
  features = [f for f in record.features if f.type=="tRNA"]
  #Record an Artemis style integer color in the feature's qualifiers,
  #1 = Black, 2 = Red, 3 = Green, 4 = blue, 5 =cyan, 6 = purple
  for f in features: f.qualifiers["color"] = [index+2]
  cur_chromosome = BasicChromosome.Chromosome(name)
  #Set the scale to the MAXIMUM length plus the two telomeres in bp,
  #want the same scale used on all five chromosomes so they can be
  #compared to each other
  cur_chromosome.scale_num = max_len + 2 * telomere_length

  #Add an opening telomere
  start = BasicChromosome.TelomereSegment()
  start.scale = telomere_length
  cur_chromosome.add(start)

  #Add a body - again using bp as the scale length here.
  body = BasicChromosome.AnnotatedChromosomeSegment(length, features)
  body.scale = length
  cur_chromosome.add(body)

  #Add a closing telomere
  end = BasicChromosome.TelomereSegment(inverted=True)
  end.scale = telomere_length
  cur_chromosome.add(end)

  #This chromosome is done
  chr_diagram.add(cur_chromosome)

chr_diagram.draw("tRNA_chrom.pdf", "Arabidopsis thaliana")

It might warn you about the labels being too close together - have a look at the forward strand (right hand side) of Chr I, but it should create a colorful PDF file.

Is it a good way for human genome? And what about Bio.Entrez?

Many thanks!

Natasha

tRNA-sequence human Bio.Entrez • 2.5k views

ADD COMMENT • link 8.7 years ago by natasha.sernova ★ 4.0k

score 1 · Accepted Answer · 2016-12-08

1

Entering edit mode

8.7 years ago

Neilfws 49k

First you need to figure out the Entrez query to return what you want. This search term gets human tRNA identifiers from the nucleotide database:

Homo sapiens[porgn] AND biomol_trna[PROP]

Then you need to read the Bio.Entrez documentation and figure out how it implements e-search and e-fetch. The code you posted is not related to this task at all.