Question

Repeat Regions In Rna-Seq Data

0

Entering edit mode

12.1 years ago

abi ▴ 410

Hi all, The genome of the bacterium that we work with is flooded with repeat regions. We are interested in one such repeat region which we believe might be transcribed. We have a RNA-seq data set in which we could look for this repeat. But the problem is that reads which map to such repeats would be discarded by the aligners (due to multiple hits) or one chosen and reported at random. We are also looking at wet lab possibilities (In vitro transcription, Northerns). But if there could be a possibility of catching transcription at such repeat regions from our RNA-seq data, would have been great. Is there be a possibility to address such repeat regions and cDNA reads? Thanks Abi

rna-seq • 4.4k views

ADD COMMENT • link updated 12.1 years ago by swbarnes2 14k • written 12.1 years ago by abi ▴ 410

score 0 · Answer 1 · 2013-03-20

0

Entering edit mode

12.1 years ago

Josh Herr 5.8k

First of all I laughed out loud when I read

the bacterium that we work with is flooded with repeat regions

because, as someone who works with both plant and fungal genomes, the idea of a bacteria with repeats comparable to Eukaryotes is comical. So, you're in a good position as this shouldn't be too difficult.

You didn't provide us with valuable information: Do you have a sequenced genome for your bacteria? Have you even tried to use any aligners to see how this repeat maps? BLAST? If you can place the region on a sequenced genome using you can computationally predict transcription start sites and promoter regions and see if this is the case. It's possible something could be transcribed, but do you know if this transcript makes a protein (doubt it if it's a true repeat, but you should look).

There are MANY ways to study this through wet lab experiments but that is for another community and not directly bioinformatics.

ADD COMMENT • link 12.1 years ago by Josh Herr 5.8k

0

Entering edit mode

Thanks Josh for your response. OK I did overestimate the repeats in this bacterial genome compared to fungal and plant genomes although it is much more compared to other species like coli or salmonella. Here some background info on what we already have or did. The sequence of the organism is available. This repeat (around 200 bp) occurs 19 times on our ca 2 Mb genome. Computationally no sigma 70 promoter or a rho independent promoter could be detected at least by BPROM or TransTerm which tends to the possibility that this might not be transcribed. I did use a couple of aligners bowtie, bwa with our illumina single end read data and could not see reads aligned to this region. I asked myself could it be that there are actually reads which align to any of the 19 repeats which might be transcribed but because they would be counted as multiple hits and hence are discarded and not reported? No this region does not code for any proteins in all 6 frames. What did you mean by BLAST? Take the reads discarded by the aligner and blast it on to our genome and see if any of them hit our repeat region? That could be done, will check on this. Sorry for not providing details in my initial post

ADD REPLY • link 12.1 years ago by abi ▴ 410

0

Entering edit mode

Thanks for providing more information. I was only joking about the number of repeats -- just think it shouldn't be too difficult to figure out. So, you have the repeat as a read in your RNA-Seq dataset? Have you determined this or are you just wanting to look? Yes, I would try to use BLAST to see if your repeat is found in the transcripts. Do you have a transcription start site near the repeat?

ADD REPLY • link 12.1 years ago by Josh Herr 5.8k

score 0 · Answer 2 · 2013-03-20

0

Entering edit mode

12.1 years ago

swbarnes2 14k

Longer reads or paired end reads help, because you might get a unique anchor in part of your sequence, but other than that, there's not much you can do, other than wait a few years for longer read technology.

ADD COMMENT • link 12.1 years ago by swbarnes2 14k

0

Entering edit mode

That exactly is our problem - we have single end illumina reads. You are right on with having longer reads which could go into neighboring unique regions and hence be reported. Thanks anyways!

ADD REPLY • link 12.1 years ago by abi ▴ 410