Hi all, The genome of the bacterium that we work with is flooded with repeat regions. We are interested in one such repeat region which we believe might be transcribed. We have a RNA-seq data set in which we could look for this repeat. But the problem is that reads which map to such repeats would be discarded by the aligners (due to multiple hits) or one chosen and reported at random. We are also looking at wet lab possibilities (In vitro transcription, Northerns). But if there could be a possibility of catching transcription at such repeat regions from our RNA-seq data, would have been great. Is there be a possibility to address such repeat regions and cDNA reads? Thanks Abi
Thanks Josh for your response. OK I did overestimate the repeats in this bacterial genome compared to fungal and plant genomes although it is much more compared to other species like coli or salmonella. Here some background info on what we already have or did. The sequence of the organism is available. This repeat (around 200 bp) occurs 19 times on our ca 2 Mb genome. Computationally no sigma 70 promoter or a rho independent promoter could be detected at least by BPROM or TransTerm which tends to the possibility that this might not be transcribed. I did use a couple of aligners bowtie, bwa with our illumina single end read data and could not see reads aligned to this region. I asked myself could it be that there are actually reads which align to any of the 19 repeats which might be transcribed but because they would be counted as multiple hits and hence are discarded and not reported? No this region does not code for any proteins in all 6 frames. What did you mean by BLAST? Take the reads discarded by the aligner and blast it on to our genome and see if any of them hit our repeat region? That could be done, will check on this. Sorry for not providing details in my initial post
Thanks for providing more information. I was only joking about the number of repeats -- just think it shouldn't be too difficult to figure out. So, you have the repeat as a read in your RNA-Seq dataset? Have you determined this or are you just wanting to look? Yes, I would try to use BLAST to see if your repeat is found in the transcripts. Do you have a transcription start site near the repeat?