extracting snoRNA intronic sequences
2
0
Entering edit mode
9.9 years ago

Hi bioinformaticians,

I am building an index for RNA-Seq purposes. It would be composed of the intronic sequences where snoRNA are encoded. The problem is that, using the ensEMBL Perl API, I haven't been able to find a way to extract the sequences. The snoRNA genes are not linked to the introns. This suggests that I would be needed to generate a slice. If anyone has any advice, suggestion or alternative to this method, it would be greatly appreciated.

Thanks for your help

ensEMBL RNA-Seq • 2.1k views
ADD COMMENT
1
Entering edit mode

Are you only interested in intronic snoRNAs or instead the full sequence of introns harboring snoRNAs?

ADD REPLY
0
Entering edit mode

I would be interested in the sequence harboring the snoRNA between two exon sequences. So I guess the full sequence of introns harboring snoRNAs would be it.

ADD REPLY
4
Entering edit mode
9.9 years ago

I would do something like the following:

  1. Use biomart to get snoRNA coordinates (in BED format or something similar).
  2. Use biomart to get the coordinates of all introns (again, in something like BED format).
  3. Use bedtools to intersect the two, writing intronic coordinates wholly encompassing snoRNA coordinates.
  4. Use bedtools getfasta with the resulting BED file to get a fasta file of intronic sequences harboring snoRNAs.

That shouldn't even require writing any new code :)

ADD COMMENT
0
Entering edit mode

Thank you. I had not thought about using bedtools to complete this.

ADD REPLY
1
Entering edit mode
9.9 years ago
Tariq Daouda ▴ 220

Hi,

I wrote a python module that would allow you to do that easily. http://pyGeno.iric.ca

from pyGeno.Genome import *

ref = Genome(name = "GRCh37.75")
chro = Genome,get(Chromosome, number = "22")
intronSeq = chro.sequence[x1:x2]

If you know which exons you are interested in you could also do:

exon1 = chro.get(Exon, id = "ENS...")
exon2 = chro.get(Exon, id = "ENS...")
intronSeq = chro.sequence[exon1.end:exon2.start]

If you want to do it by transcripts:

trans = ref.get(Transcript, id = "ENST...") or ref.get(Transcript, name = "whatever-001")

chro = trans.chromosome
intronSeq = chro.sequence[ trans.exons[0].end : trans.exons[1].start ]

This assumes that you are working on the human genome GRCh37.75. If you need something else let me know I would be happy to show you how to import any genome made available by Ensembl onto pyGeno.

Hope that helps,

Cheers

ADD COMMENT

Login before adding your answer.

Traffic: 1750 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6