Hi, I have a basic question about RNA-seq analysis. If reads alignment rate is about 40-50% (from bowties, hisat2, or other alignment tools), would it be appropriate to increase the sequencing depth and get enough aligned reads to do analysis? Or this low alignment rate would cause some bias so we should abandon these samples? Thank you!
The sample is rice, and has high quality reference genome. I used bowtie2 to do the alignment, the summary is:
82280146 reads; of these:
82280146 (100.00%) were paired; of these:
41474464 (50.41%) aligned concordantly 0 times
12443024 (15.12%) aligned concordantly exactly 1 time
28362658 (34.47%) aligned concordantly >1 times
----
41474464 pairs aligned concordantly 0 times; of these:
1444965 (3.48%) aligned discordantly 1 time
----
40029499 pairs aligned 0 times concordantly or discordantly; of these:
80058998 mates make up the pairs; of these:
73562998 (91.89%) aligned 0 times
414858 (0.52%) aligned exactly 1 time
6081142 (7.60%) aligned >1 times
55.30% overall alignment rate
The reason why the rate is low is that there is condamination of some bacterium. I just want to know if this kind of reads could be appropriate for downstream analysis.
Would you mind adding the hisat2 alignment summary here ?
I've added the information above.
It might be interesting to know which species you're working with since a mapping rate of 40% would seem very low in human or mice but not in another species that is less well annotated. And the tissue you are working with obviously also plays into that evaluation.
Please be as complete as possible and add information such as:
Thank you for your suggestion.
I don't think bowtie2 is a suitable aligner for spliced reads, as I assume rice has.
In case of bacterial contamination, you can use e.g. BBSplit to separate the reads originating from the bacterium. While continuing with the "host" reads, you may want to control for the bacterial influence (directly to the gene expression, or indirectly by distortion of the fragment ratios in the library). You can include it as a factor in your DE-model and check it as Devon suggested with a PCA or a NLDA.
Do the samples have a sufficient read length, so > 50bp. I experienced on downloaded data that low mapping rates might primarily be due to poor read length (like 36bp or 25bp).