Question

get rRNA FASTA file for a particular bacteria

1

Entering edit mode

3.4 years ago

basuanubhav ▴ 140

Hey all, I was trying to find a way to get all rRNA (5S, 16S and 23S) FASTA sequences for a particular bacteria (B. thetaiotaomicron VPI-5482, which is the type strain). I wanted this file so that I could use something like bowtie2 to map my fastq reads from an RNAseq experiment and take the unaligned reads for further downstream processing (DGE etc.). It would be nice if someone could also supply the required command/code.

Also, after doing FastQC, I saw that my files had ~90% duplicated reads, which I think is way too much but I've only dealt with human RNA seq data before so it may be different for bacteria. I took a few of the overrepresented reads and saw that they are mapping to rRNA genes in my bacteria. This necessarily does not mean that all the duplicated reads are rRNAs but should I consider removing the reads mapping to rRNA or just leave them be in my analysis. Another method could be do remove the rRNA genes from the GTF file prior to using featurecounts or even removing the rRNA genes from the count matrix after using featurecounts. The last option that comes to my mind is to remove the rRNA genes after the DESeq2 normalization, but I don't know if that would be the right thing to do.

Anyways, any help in this matter would be very much appreciated. I have seen few discussions posts on this matter and there doesn't seem to be a clear consensus.

Thanks in advance.

rRNA RNAseq • 1.9k views

ADD COMMENT • link 3.4 years ago by basuanubhav ▴ 140

0

Entering edit mode

general stats 1 general stats 2 seq dups seq dups 2 overrepresented seqs

I am uploading snapshots of my MULTIQC report. As you can see, all my files have > 90% dups and most are >10k+. What do you reckon I should do?

ADD REPLY • link 3.4 years ago by basuanubhav ▴ 140

1

Entering edit mode

For B. thetaiotaomicron VPI-5482, a multifasta file with the rRNA can be found here

That fasta file will include rRNA, tRNA and other ncRNA.

About the mapping strategy prior to DGE analysis; Just simply map your reads against the the reference genome and then remove the rRNA genes from the expression matrix prior to normalization.

ADD REPLY • link 3.4 years ago by andres.firrincieli 3.9k

0

Entering edit mode

Thanks, I will map my files as they are and see what I get. Then I will remove rRNA genes prior to normalization. Hope I get something!

Also, can I ask how you got the multicast file? Is it on some website like ensemble or did you custom make it from a fasta will with all RNAs ?

ADD REPLY • link 3.4 years ago by basuanubhav ▴ 140

0

Entering edit mode

We had a similar problem. The rRNA depletion step failed with 80 - 90% of the reads mapping against the rRNA, leaving us with only 1.6 - 2.0 million of reads per sample. Despite this, about the 7% of genes were differentially expressed (LFC 1.5 and FDR 0.01). As GenoMax said, nothing is lost if the effect is consistent across all your samples.

ADD REPLY • link 3.4 years ago by andres.firrincieli 3.9k

1

Entering edit mode

So I tried to run bowtie2 on just 1 file to see how it looks. I ran it on galaxy (because I fear it would take ages on my local system) with default parameters on 2 fast file : one with the chromosomal sequence and one custom FASTA with just the rRNAs.

Looks like 60% of the reads were mapped against rRNAs. So that would leave 40% reads mapped against the genome and I will have ~5 million reads mapping against the stuff I want.

Not too bad I guess.. haha. Guess I will just align everything with bowtie2 and not bother with removing rRNA genes from the GTF file.

Thanks a lot!

map against rRNA map against genome

ADD REPLY • link 3.4 years ago by basuanubhav ▴ 140

1

Entering edit mode

You should find out if anything was done to deplete rRNA at the time of library preparation. If it was then the duplicates you are seeing may not necessarily be rRNA (unless the depletion has not worked). If they are indeed rRNA then you may have only ~1M per sample usable reads which may or may not be enough for what you are trying to do. At least the effect seems to be consistent across all samples.

Before you do any additional manipulations go ahead and analyze the data (align/count etc) and see what you get. You can indeed filter our RNA reads at some point in the analysis.

ADD REPLY • link 3.4 years ago by GenoMax 151k

0

Entering edit mode

So, the sequencing company writes on their website that they used a Ribo Zero plus kit. I emailed them about any quality control that they have done. Lets see. In the meanwhile, I will go ahead with the analysis and hope all the dups are not rRNA.

Thanks

ADD REPLY • link 3.4 years ago by basuanubhav ▴ 140