rRNA contamination after filtering
0
0
Entering edit mode
6 weeks ago

Hi everyone ! I am re-analyzing a dataset from GEO. I have noticed that there was a strange GC in some samples, and after pseudo-counting with salmon I got about 20-30% of aligned reads. Supposing that it was rRNA contamination, I filtered fastqs with bbduk.sh and this rRNA fasta sequence. Then my alignment percent grown up to ~40%. I also used STAR on filtered reads, it shows me about 90% of reads mapping to genome.

Next, I decided to filter by leaving only sequences that are present in transcriptomes by using bbduk.sh with gencode.v29 fasta sequences. While ~55% of sequences successfully aligned to transcriptome (I find it also strange), next salmon quantification also shows me about 45% of aligned reads, the rest being multi-mapped. According to bbduk's report, most of reads, aligned to transcriptome, fall to RNA5S1 gene or other rRNA genes, probably causing multi-map.

Here comes my question: do I need to filter out also these multi-mapping rRNA genes, or may I just continue with my 40% of aligned reads to do DEA ? I have never seen below 84% of aligned reads with salmon.

Your suggestions are appreciated. Thanks in advance!

QC contamination ribosomal-RNA • 261 views
ADD COMMENT
0
Entering edit mode

From the GEO record you linked above:

We performed ribosomal RNA (rRNA) depleted RNA sequencing (RNA-seq)

Looks like the depletion seems to not have worked for some (?) samples. If that is the case you will need to account for those samples in some way. Trying to remove rRNA counts is not going to affect the counts for real genes.

ADD REPLY

Login before adding your answer.

Traffic: 1883 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6