Hi,
I have an RNA-seq dataset from a non-model organism and need to do de novo assembly. After running Trinity, I checked the quality of the assembled transcriptome by aligning back the reads using both bowtie and bowtie2. The problem is that the "aligned concordantly exactly 1 time" percentage is very low. For instance from bowtie2:
40191191 reads; of these:
40191191 (100.00%) were paired; of these:
1107263 (2.75%) aligned concordantly 0 times
811254 (2.02%) aligned concordantly exactly 1 time
38272674 (95.23%) aligned concordantly >1 times
----
1107263 pairs aligned concordantly 0 times; of these:
3136 (0.28%) aligned discordantly 1 time
----
1104127 pairs aligned 0 times concordantly or discordantly; of these:
2208254 mates make up the pairs; of these:
585954 (26.53%) aligned 0 times
74874 (3.39%) aligned exactly 1 time
1547426 (70.07%) aligned >1 times
99.27% overall alignment rate
The same trend was observed when using busco to check the quality of assembly:
C:92.4%[S:5.8%,D:86.6%],F:1.3%,M:6.3%,n:4596
I have used BBduk and Trimmomatic to remove adapters and low quality reads. For cleaned reads, fastqc confirmed removal of adapters and per base quality but failed “Sequence Duplication Levels”. I assumed this is very common with RNA-seq data (?).
Now, I wanted to know if someone can suggest what improvement can be done in this case? Can I continue with this assembly? My ultimate plan is to run corset to remove the redundancies, use RSEM to quantify the expression levels and then performing DE analysis. Thanks for the help in advance!
Yes, that is very common for RNA-seq data. However, a possible cause of that
38272674 (95.23%) aligned concordantly >1 times
is the presence rRNA reads in your libraries. Long story short, you failed the depletion of rRNA during the library preparationI tried sortmerna to filter out the rRNA reads. But for each sample, it detected less than 1% (mostly 0.5 - 0.7%) of the reads as rRNA. I assume the rRNA reads should have been removed when I ran BBduk in the beginning of my analysis. Should I still be suspicious about rRNA reads?
No worries, less than 1% is fine. rRNA is not the problem