Hello,
I'm currently working on RNA-seq data from A.thaliana, and I have questions about quality and GC content. I guess it's normal to have a higher GC% in RNA-seq data than in the genome itself, since coding sequences usually show a bias toward GC. However, A.thaliana has a GC rate of 36% and my samples go up to 51-53%, isn't that a bit too much?
I'm wondering because although the quality of the sequencing looked OK from the FastQC reports, I have a very low rate of mapping, like 10-20% of reads. I have only one sample that maps over 60%, and this one has a GC rate of 44%.
I tried mapping with bowtie2 and subread-align, both with default params (meaning 0 mismatches and 3 mismatches respectively).
I'm a bit confused here, any idea someone?
EDIT
I tried aligning on the TAIR10 assembly instead of Araport11 and now I've got >90% of mapping for each sample! I'm still confused but at least it works...
Is it paired-end data? If yes, you could try to align the reads separately as single-end data. If alignment rate seems reasonable, you can try to increase fragment size (see advanced parameters of aligners). Furthermore, I would recommend STAR-aligner. I don't know much about subread-align, but bowtie2 designed for DNA data.