Question

How to extract common exons of multiple transcripts of the same gene

0

Entering edit mode

9.9 years ago

simonjean434 ▴ 70

Hello,

I would like to extract common exons of multiple transcripts of the same gene from gencode.v19.annotation.gtf. Is there any easy way to do that?

Thanks in advance for your help

Jean

RNA-Seq • 3.4k views

ADD COMMENT • link updated 2.7 years ago by Ram 44k • written 9.9 years ago by simonjean434 ▴ 70

Ram · Answer 1 · 2014-12-29

Perhaps make two BED files of genes and exons, and then map exons to genes:

$ wget -qO- ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_19/gencode.v19.annotation.gff3.gz \
   | gunzip --stdout - \
   | awk '$3=="gene"' - \
   | convert2bed -i gff - \
   > genes.bed

$ wget -qO- ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_19/gencode.v19.annotation.gff3.gz \
   | gunzip --stdout - \
   | awk '$3=="exon"' - \
   | convert2bed -i gff - \
   > exons.bed

$ bedmap --echo --echo-map genes.bed exons.bed > genes_with_mapped_exons.bed

Once you have this list of exon IDs that correspond to each gene, you could perhaps use a Perl or Python script to process each gene-line, parse out each exon's coordinates to a key-value pair in a hash table, and find the common exons based on a duplicate key. Print those exons or exon IDs out to standard output, along with the gene ID, for instance.

Ram · Answer 2 · 2014-12-29

1

Entering edit mode

9.9 years ago

Tariq Daouda ▴ 220

Hi,

I'm the developer of pyGeno. pyGeno is designed to simplify this kind of operations and allow you to go further if needed. Here's an example that works for the Gene TPST2. Of course it would work for any other gene, and you could also loop through all the genes of the genome if you would like to:

from pyGeno.Genome import *

ref = Genome(name = "GRCh37.75")
gene = ref.get(Gene, name = "TPST2")[0]
#you can also make a query using the Ensembl Id 
#gene = ref.get(Gene, id = "ENSG00000128294")[0]

exons = {}

for trans in gene.get(Transcript) :
   for exon in trans.exons :
     if exon.id not in exons :
        exons[exon.id] = exon

print "Exons in common"

for e in exons :
  print e
  print 'whole sequence', e.sequence
  print 'coding sequence', e.CDS

pyGeno uses Ensembl annotations.

Hope that helps.

Cheers

ADD COMMENT • link updated 2.7 years ago by Ram 44k • written 9.9 years ago by Tariq Daouda ▴ 220

0

Entering edit mode

Hi,

I tried importing the pyGeno module. I am getting the following error:

Traceback (most recent call last):
  File "<pyshell#9>", line 1, in <module>
    from pyGeno.Genome import *
  File "C:\Python27\lib\site-packages\pyGeno\__init__.py", line 3, in <module>
    from configuration import pyGeno_init
ImportError: No module named 'configuration'

Please help me. Thank you.

ADD REPLY • link updated 2.7 years ago by Ram 44k • written 8.6 years ago by s.yaaminee • 0