I have rRNA-depleted RNA seq data from mouse. I wanted to map the reads to mm10 transcriptome using hisat2. My scripts are as follows: hisat2 -x /data/Hisat2Index/In \ --no-spliced-alignment \ --maxins 600 \ --mp 1,0 \ -1 UM_1_val_1.fq -2 /UM_2_val_2.fq \ -S UM_mp_1_0_hisat2.sam \ --summary-file UM_hisat2_mp_1_0_summary.txt
My mapping output is as follows: 19754290 reads; of these: 19754290 (100.00%) were paired; of these: 13774555 (69.73%) aligned concordantly 0 times 2766396 (14.00%) aligned concordantly exactly 1 time 3213339 (16.27%) aligned concordantly >1 times ---- 13774555 pairs aligned concordantly 0 times; of these: 570850 (4.14%) aligned discordantly 1 time ---- 13203705 pairs aligned 0 times concordantly or discordantly; of these: 26407410 mates make up the pairs; of these: 22991167 (87.06%) aligned 0 times 1623547 (6.15%) aligned exactly 1 time 1792696 (6.79%) aligned >1 times 41.81% overall alignment rate
I checked that all the adapters are removed. fastqc result is good Blast results shows there is no contamination. I also tried two different reference transcriptome: mm10 from genecode and esembl, the mapping rate is similar.
Can anyone gives any suggestion? Thanks very much!
I have not try that, but I think I figured out the reason. By looking at the fastqc report I found that there are a severe GC bias which indicates a lot of duplicates in my fastq files. That is probably the reason for a low mapping rate.
Can you put up the FastQC GC content plot?
please go to this link: https://ibb.co/D5d2VHQ