I am am aligning my RNAseq data to a reference genome using Hisat2 for the first time and I have what I am sure is a basic question. However I am still confused after reading a number of online resources.
Broadly, my pipeline goes from FastQC to rCorrector to Trimomatic to Hisat2, and I am not certain exactly how to interpret my results.
From aligning my trimmed reads I get an output something like:
23113803 reads; of these: 23113803 (100.00%) were paired; of these: 21488690 (92.97%) aligned concordantly 0 times 753270 (3.26%) aligned concordantly exactly 1 time 871843 (3.77%) aligned concordantly >1 times ---- 21488690 pairs aligned concordantly 0 times; of these: 5618651 (26.15%) aligned discordantly 1 time ---- 15870039 pairs aligned 0 times concordantly or discordantly; of these: 31740078 mates make up the pairs; of these: 2583394 (8.14%) aligned 0 times 14947960 (47.09%) aligned exactly 1 time 14208724 (44.77%) aligned >1 times 94.41% overall alignment rate
I am a bit confused as to how to interpret these outputs and wonder if there is an 'ideal' percentage of reads that have been aligned 0, exactly 1, and > 1 time?
As well as how to interpret high overall alignment rates with high percentages of paired reads that aligned concordantly 0 times. Thank you in advance for any help!
Model organism? What is the read length, is the reference genome of high quality and is this any kind of lowest input RNA-seq? Why did you use this corrector and trimmomatic? Standard RNA-seq typically does not require any pre-processing prior to alignment.
Thank you for the response! This is not a model organism but a closely related wild canid referenced to the CanFam3 genome. I used rcorrector and trimomatic to perform kmer filtering and remove a small amount of adapter contamination. The Illumina RNA-seq library prep was performed at a genomics core and the QC before and after sequencing did not suggest any sample or protocol issues. FastQC results suggest that read quality is good, though there is some evidence of repetitive reads. Libraries were run on a NextSeq 2x 75bp Mid Output Flow Cell.
I have also run Hisat alignment on the raw data before trimming and am pasting the alignment summary below: