Question

Abnormal per sequence GC content and weird sequences in bacterial RNA-seq

0

Entering edit mode

5.1 years ago

ikangkim ▴ 50

Hi,

I'm analyzing my first RNA-seq results that were obtained from a bacterial strain at two conditions: control vs. stress.

Because the bowtie2 mapping ratio of stress condition was much lower (57%) compared to control (86%), I looked into the FastQC results and also manually checked sequences.

This is the per sequence GC content plot of control condition. I think there is no problem, with a peak around the genomic GC content (52%). Control

But, the per sequence GC content of stress condition seems to be very abnormal. There is a very thick tail toward low GC. Stress

Also, when I checked the sequences from the stress condition, I found that many sequences look very weird like below.

A00718:140:HWK2FDSXX:3:1414:5059:20807 1:N:0:CCTATGCC+CAAGCTTA CAAAAAAAAAGAACAAGCAAAAGAACACAACAAAAAAAAAAAAAACAAAAAAAAAAAAACAAAAAACAAAAAAAAAAAAAAAAAAAAAAAAACAAAAAAAT A00718:140:HWK2FDSXX:3:1561:21287:5572 1:N:0:CCTATGCC+CAAGCTTA GTTTTTTTTGTCTTTTGTGTTTTGTTGTGGTTGGTTTGCTTTGTTGTTTTTGTGGGTTGGTTGTTTTTTTTTTTGTTGTTTGTTTTTTTTTTTTTTTGGTT A00718:140:HWK2FDSXX:3:1436:24948:22310 1:N:0:CCTATGCC+CAAGCTTA CAGCACCACACAAGCAGACCCCTGCGCACAAACACGAAACCCACCCGCCCGGGCCCCCGCGCCCGCCGGGGGGGGGGGGGGGGGGGCGGGGGGGGGGGGGC A00718:140:HWK2FDSXX:3:1337:16532:12273 1:N:0:CCTATGCC+CAAGCTTA CACCAACAAAAAAAAACCAAAACACAAAAAACCAAAACCAAAAAACAAAAACAAAAAAAAAAAACCAAAAAAAAAAAAACAACAAAAAAACAAAAGAACAA

Note that the above plots and sequences were obtained after trimming (by bbduk).

What might go wrong in the stress condition?

Thanks.

RNA-Seq sequencing • 1.7k views

ADD COMMENT • link 5.1 years ago by ikangkim ▴ 50

0

Entering edit mode

Maybe do some taxonomic analysis of your data (blobtools, kraken, centrifuge). The low mapping rate is concerning.

ADD REPLY • link 5.1 years ago by cschu181 ★ 2.8k

0

Entering edit mode

Thanks for your suggestion.

Because I came to know that ~10% of reads at the stress condition were mapped onto rRNA genes (despite rRNA depletion), I performed taxonomic classification of randomly-sampled reads against Silva SSU rRNA database using classify.seqs command available in Mothur.

The result showed that there was no or little contamination.

Thanks.

ADD REPLY • link 5.1 years ago by ikangkim ▴ 50