Interpreting Hisat2 alignment output
1
3
Entering edit mode
5.7 years ago
Whirlingdaf ▴ 60

I am am aligning my RNAseq data to a reference genome using Hisat2 for the first time and I have what I am sure is a basic question. However I am still confused after reading a number of online resources.

Broadly, my pipeline goes from FastQC to rCorrector to Trimomatic to Hisat2, and I am not certain exactly how to interpret my results.

From aligning my trimmed reads I get an output something like:

23113803 reads; of these:
    23113803 (100.00%) were paired; of these:
    21488690 (92.97%) aligned concordantly 0 times
    753270 (3.26%) aligned concordantly exactly 1 time
    871843 (3.77%) aligned concordantly >1 times
    ----
    21488690 pairs aligned concordantly 0 times; of these:
      5618651 (26.15%) aligned discordantly 1 time
    ----
    15870039 pairs aligned 0 times concordantly or discordantly; of these:
      31740078 mates make up the pairs; of these:
        2583394 (8.14%) aligned 0 times
        14947960 (47.09%) aligned exactly 1 time
        14208724 (44.77%) aligned >1 times
94.41% overall alignment rate
  

I am a bit confused as to how to interpret these outputs and wonder if there is an 'ideal' percentage of reads that have been aligned 0, exactly 1, and > 1 time?

As well as how to interpret high overall alignment rates with high percentages of paired reads that aligned concordantly 0 times. Thank you in advance for any help!

hisat RNA-Seq alignment • 11k views
ADD COMMENT
0
Entering edit mode

Model organism? What is the read length, is the reference genome of high quality and is this any kind of lowest input RNA-seq? Why did you use this corrector and trimmomatic? Standard RNA-seq typically does not require any pre-processing prior to alignment.

ADD REPLY
0
Entering edit mode

Thank you for the response! This is not a model organism but a closely related wild canid referenced to the CanFam3 genome. I used rcorrector and trimomatic to perform kmer filtering and remove a small amount of adapter contamination. The Illumina RNA-seq library prep was performed at a genomics core and the QC before and after sequencing did not suggest any sample or protocol issues. FastQC results suggest that read quality is good, though there is some evidence of repetitive reads. Libraries were run on a NextSeq 2x 75bp Mid Output Flow Cell.

I have also run Hisat alignment on the raw data before trimming and am pasting the alignment summary below:

26411123 reads; of these: 26411123 (100.00%) were paired; of these: 5170797 (19.58%) aligned concordantly 0 times 14261798 (54.00%) aligned concordantly exactly 1 time 6978528 (26.42%) aligned concordantly >1 times ---- 5170797 pairs aligned concordantly 0 times; of these: 51392 (0.99%) aligned discordantly 1 time ---- 5119405 pairs aligned 0 times concordantly or discordantly; of these: 10238810 mates make up the pairs; of these: 7175529 (70.08%) aligned 0 times 2030231 (19.83%) aligned exactly 1 time 1033050 (10.09%) aligned >1 times 86.42% overall alignment rate

ADD REPLY
8
Entering edit mode
5.7 years ago

Explanation of HISAT2 summary statistics

HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes (as well as to a single reference genome).

The below explanation was originally posted by me on biostars.org 😎

The summary looks like this

HISAT2 summary stats:
            Total pairs: 11587225
                    Aligned concordantly or discordantly 0 time: 4464083 (38.53%)
                    Aligned concordantly 1 time: 2195620 (18.95%)
                    Aligned concordantly >1 times: 4877336 (42.09%)
                    Aligned discordantly 1 time: 50186 (0.43%)
            Total unpaired reads: 8928166
                    Aligned 0 time: 8019048 (89.82%)
                    Aligned 1 time: 304653 (3.41%)
                    Aligned >1 times: 604465 (6.77%)
            Overall alignment rate: 65.40%

Description

1. Total pairs: 11587225

Total reads = 11587225 * 2 = 23174450 (matches total number of reads in the sample)

2. Aligned concordantly or discordantly 0 time: 4464083 (38.53%)

These are unmapped reads : 4464083 * 2 (paired end) = 8928166

 ( 8928166 /  23174450 (Total reads) ) * 100 ~ 38.53%

3. Aligned concordantly 1 time: 2195620 (18.95%)

These are uniquely mapped reads : 2195620 * 2 (paired end) = 4391240

( 4391240 /  23174450 (Total reads) ) * 100 ~ 18.95%

4. Aligned concordantly >1 times: 4877336 (42.09%)

These are multi mapped reads : 4877336 * 2 = 9754672

( 9754672 /  23174450 (Total reads) ) * 100 ~ 42.09%

5.Aligned discordantly 1 time: 50186 (0.43%)

Discordant aligned : 50186 * 2 = 100372

( 100372 /  23174450 (Total reads) ) * 100 ~ 0.43%

6. Total unpaired reads: 8928166

These are not paired reads

  • Aligned 0 time: 8019048 (89.82%)

    (8019048 / 8928166 ) * 100 = 89.82% i.e. 89% of the unpaired reads did not align at all

  • Aligned 1 time: 304653 (3.41%)

    (304653 / 8928166 ) * 100 = 3.41% i.e. 3.41% of the unpaired reads aligned once

  • Aligned >1 times: 604465 (6.77%)

    (604465 / 8928166 ) * 100 = 6.77% i.e. 6.77% of the unpaired reads are multi mapped

7. Overall alignment rate: 65.40%

Calculation as explained below

PAIRED READS

Aligned concordantly 1 time: (2195620 * 2 = 4391240) Aligned concordantly >1 times: (4877336 * 2 = 9754672) Aligned discordantly 1 time: (50186 * 2 = 100372)

UNPAIRED READS

Aligned 1 time: 304653 Aligned >1 times: 604465


Total = 4391240 + 9754672 + 100372 + 304653 + 604465 = 15155402

Overall Alignment Rate = (15155402 / 23174450) * 100 = 65.40%

view raw HISAT_stats.md hosted with ❤ by GitHub

ADD COMMENT
0
Entering edit mode

Thank you, this is a helpful breakdown!

ADD REPLY
0
Entering edit mode

Hello lakhujanivijay - What about the total of unmapped reads?

I do it like the sum of unmapped paired + the unmapped unpaired reads:

total unmapped reads: 8928166 + 8019048= 16947214.

When I do the sum of unmapped reads and mapped reads (overall) , it is higher than in the total input reads

Total reads = overall mapped +overall unmapped = 15155402+ 16947214= 32102616 , while the total input reads is 23174450

Do you have any idea?

Cheers, ~DD

ADD REPLY

Login before adding your answer.

Traffic: 1697 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6