Hi,
I'm analyzing my first RNA-seq results that were obtained from a bacterial strain at two conditions: control vs. stress.
Because the bowtie2 mapping ratio of stress condition was much lower (57%) compared to control (86%), I looked into the FastQC results and also manually checked sequences.
This is the per sequence GC content plot of control condition. I think there is no problem, with a peak around the genomic GC content (52%).
But, the per sequence GC content of stress condition seems to be very abnormal. There is a very thick tail toward low GC.
Also, when I checked the sequences from the stress condition, I found that many sequences look very weird like below.
A00718:140:HWK2FDSXX:3:1414:5059:20807 1:N:0:CCTATGCC+CAAGCTTA CAAAAAAAAAGAACAAGCAAAAGAACACAACAAAAAAAAAAAAAACAAAAAAAAAAAAACAAAAAACAAAAAAAAAAAAAAAAAAAAAAAAACAAAAAAAT A00718:140:HWK2FDSXX:3:1561:21287:5572 1:N:0:CCTATGCC+CAAGCTTA GTTTTTTTTGTCTTTTGTGTTTTGTTGTGGTTGGTTTGCTTTGTTGTTTTTGTGGGTTGGTTGTTTTTTTTTTTGTTGTTTGTTTTTTTTTTTTTTTGGTT A00718:140:HWK2FDSXX:3:1436:24948:22310 1:N:0:CCTATGCC+CAAGCTTA CAGCACCACACAAGCAGACCCCTGCGCACAAACACGAAACCCACCCGCCCGGGCCCCCGCGCCCGCCGGGGGGGGGGGGGGGGGGGCGGGGGGGGGGGGGC A00718:140:HWK2FDSXX:3:1337:16532:12273 1:N:0:CCTATGCC+CAAGCTTA CACCAACAAAAAAAAACCAAAACACAAAAAACCAAAACCAAAAAACAAAAACAAAAAAAAAAAACCAAAAAAAAAAAAACAACAAAAAAACAAAAGAACAA
Note that the above plots and sequences were obtained after trimming (by bbduk).
What might go wrong in the stress condition?
Thanks.
Maybe do some taxonomic analysis of your data (blobtools, kraken, centrifuge). The low mapping rate is concerning.
Thanks for your suggestion.
Because I came to know that ~10% of reads at the stress condition were mapped onto rRNA genes (despite rRNA depletion), I performed taxonomic classification of randomly-sampled reads against Silva SSU rRNA database using classify.seqs command available in Mothur.
The result showed that there was no or little contamination.
Thanks.