Question

extracting snoRNA intronic sequences

0

Entering edit mode

9.9 years ago

faba_mon_frere • 0

Hi bioinformaticians,

I am building an index for RNA-Seq purposes. It would be composed of the intronic sequences where snoRNA are encoded. The problem is that, using the ensEMBL Perl API, I haven't been able to find a way to extract the sequences. The snoRNA genes are not linked to the introns. This suggests that I would be needed to generate a slice. If anyone has any advice, suggestion or alternative to this method, it would be greatly appreciated.

Thanks for your help

ensEMBL RNA-Seq • 2.1k views

ADD COMMENT • link updated 2.7 years ago by Ram 44k • written 9.9 years ago by faba_mon_frere • 0

1

Entering edit mode

Are you only interested in intronic snoRNAs or instead the full sequence of introns harboring snoRNAs?

ADD REPLY • link 9.9 years ago by Devon Ryan 104k

0

Entering edit mode

I would be interested in the sequence harboring the snoRNA between two exon sequences. So I guess the full sequence of introns harboring snoRNAs would be it.

ADD REPLY • link 9.9 years ago by faba_mon_frere • 0

1

Entering edit mode

9.9 years ago

Tariq Daouda ▴ 220

Hi,

I wrote a python module that would allow you to do that easily. http://pyGeno.iric.ca

from pyGeno.Genome import *

ref = Genome(name = "GRCh37.75")
chro = Genome,get(Chromosome, number = "22")
intronSeq = chro.sequence[x1:x2]

If you know which exons you are interested in you could also do:

exon1 = chro.get(Exon, id = "ENS...")
exon2 = chro.get(Exon, id = "ENS...")
intronSeq = chro.sequence[exon1.end:exon2.start]

If you want to do it by transcripts:

trans = ref.get(Transcript, id = "ENST...") or ref.get(Transcript, name = "whatever-001")

chro = trans.chromosome
intronSeq = chro.sequence[ trans.exons[0].end : trans.exons[1].start ]

This assumes that you are working on the human genome GRCh37.75. If you need something else let me know I would be happy to show you how to import any genome made available by Ensembl onto pyGeno.

Hope that helps,

Cheers

ADD COMMENT • link updated 2.7 years ago by Ram 44k • written 9.9 years ago by Tariq Daouda ▴ 220

score 4 · Accepted Answer · 2015-01-07

4

Entering edit mode

9.9 years ago

Devon Ryan 104k

I would do something like the following:

Use biomart to get snoRNA coordinates (in BED format or something similar).
Use biomart to get the coordinates of all introns (again, in something like BED format).
Use bedtools to intersect the two, writing intronic coordinates wholly encompassing snoRNA coordinates.
Use bedtools getfasta with the resulting BED file to get a fasta file of intronic sequences harboring snoRNAs.

That shouldn't even require writing any new code :)

ADD COMMENT • link 9.9 years ago by Devon Ryan 104k

0

Entering edit mode

Thank you. I had not thought about using bedtools to complete this.

ADD REPLY • link 9.9 years ago by faba_mon_frere • 0