Hi everyone,
I'm analyzing RNA-seq data using HISAT2 and encountered an interesting discrepancy between paired-end and single-end alignment rates. Here's a summary of the alignment results for the same dataset:
Paired-End Alignment:
9203743 reads; of these:
9203743 (100.00%) were paired; of these:
1983452 (21.55%) aligned concordantly 0 times
1137435 (12.36%) aligned concordantly exactly 1 time
6082856 (66.09%) aligned concordantly >1 times
----
1983452 pairs aligned concordantly 0 times; of these:
14966 (0.75%) aligned discordantly 1 time
----
1968486 pairs aligned 0 times concordantly or discordantly; of these:
3936972 mates make up the pairs; of these:
3246272 (82.46%) aligned 0 times
158490 (4.03%) aligned exactly 1 time
532210 (13.52%) aligned >1 times
82.36% overall alignment rate
Single-End Alignment (Separate for Forward and Reverse Reads):
Forward Read:
9203743 reads; of these:
9203743 (100.00%) were unpaired; of these:
1536447 (16.69%) aligned 0 times
802504 (8.72%) aligned exactly 1 time
6864792 (74.59%) aligned >1 times
83.31% overall alignment rate
**Reverse Read**:
9203743 reads; of these: 9203743 (100.00%) were unpaired; of these: 1796682 (19.52%) aligned 0 times 1271662 (13.82%) aligned exactly 1 time 6135399 (66.66%) aligned >1 times 80.48% overall alignment rate ```
Observations:
Why paired-end alignment rate is lower than the single-end alignment rate for reverse reads?
What could be causing the reverse reads to have a lower alignment rate in paired-end analyses?
Are there specific quality or technical issues that might affect the reverse reads more significantly?
Any insights or suggestions on what might be causing these discrepancies and how to address them would be greatly appreciated!
I don't see much of a discrepancy. The read pairs align at an 82.4% rate, while the forward and reverse reads align at 83.3% and 80.5%. That's not all that different. Which of these numbers are concerning to you?
Ok thanks for your confirmation. is that pair end allignment rate 82.26% sufficient to DEG (Differentially expressed gene) analysis via htseq count my htseq count summary is as follows
no_feature 573785 ambiguous 35368 too_low_aQual 212148 not_aligned 1349251 alignment_not_unique 6478457