Hi!
I have been checking in one some quality control of my samples after quantification with Salmon found mapping rates for around 30-50% in most of my samples (STAR alignment is yielding 70-90% alignment of samples). Upon compiling the fastQC files for my samples using multiQC, I found that the GC content plots for most of my samples contained 2 peaks as shown below in both the trimmed and untrimmed samples. However, I do not see any adapter content as per FASTQC nor do I see any contamination with genomic DNA when I look at the BAM files using SeqMonk.
I really don't know what could be causing this and whether this aspect of the data ("not normal" distribution of GC content) is driving poor Salmon mapping rates downstream. Any advice on this matter would be greater appreciated. For reference I am working with normal and cancer patient samples from humans. Thanks for your help in advance!
I think maybe it's PCR bias that causes GC content unnormal?
Which overrepresented sequences? Did you blast them? What's the difference between the red and orange lines?
I haven't BLASTed the overrepresented sequences yet... still trying to figure out how I can pull them out. The red lines represent samples have failed FASTQC GC test, while the orange lines show samples which have a warning in the GC test.
The FastQC results will show the overrepresented sequences in a table from which you can simply copy and paste the corresponding sequences into BLAST