I got low overall alignment rate running HiSAT2
0
0
Entering edit mode
8.3 years ago
sslee1015 • 0

Hi, I'm fairly new to RNA Seq, and I don't really know how to explain these results.

Here is a generic sample of my HiSAT2 code:

hisat2 -x mm10/genome -1 sample1_R1.fastq -2 sample1_R2.fastq -S sample1out.sam

The genome reference I used is mouse, mm10, and the directory contains .ht2 files. sample1_R1.fastq is one of the paired end reads, and sample1_R2.fastq is the other. For sample 1, I received 8 different fastq files, 4 of them from R1 and the other 4 R2, so I concatenated the R1's and R2's into the fastq files I input into hisat2. This was my hisat2 summary:

32832172 reads; of these:
  32832172 (100.00%) were paired; of these:
    32326312 (98.46%) aligned concordantly 0 times
    393332 (1.20%) aligned concordantly exactly 1 time
    112528 (0.34%) aligned concordantly >1 times
    ----
    32326312 pairs aligned concordantly 0 times; of these:
      6101 (0.02%) aligned discordantly 1 time
    ----
    32320211 pairs aligned 0 times concordantly or discordantly; of these:
      64640422 mates make up the pairs; of these:
        64313845 (99.49%) aligned 0 times
        208508 (0.32%) aligned exactly 1 time
        118069 (0.18%) aligned >1 times
2.06% overall alignment rate

2.06% seems really low. Did I do something wrong?

RNA-Seq hisat hisat2 alignment • 7.1k views
ADD COMMENT
1
Entering edit mode

Something should be terribly wrong. 32 million read pairs and 0.39 million mapped ? Post the fastqc report.

ADD REPLY
2
Entering edit mode

The sample might not be what you think it is, so you may be aligning to the wrong genome. Try blasting a few of the unmapped reads.

ADD REPLY
3
Entering edit mode

Fastq Screen should provide you with a quick and easy way of telling what genome a sequence file comes from.

ADD REPLY
0
Entering edit mode
ADD REPLY
1
Entering edit mode

You have some serious problems at the 5' end of your reads. The first 3 bases are 100% GC and 70% G. A disturbed GC profile pretty normal at the start of an RNA-seq read, but I've never seen anything this extreme before. If you look at your enriched Kmers, you'll see a massive enrichment for all the different homo-polymer runs, in particular homo-G at the start.

You could try clipping off the first 10 bases of the read or so, and see if that helps, but i'd be a bit nervous because you don't know the cause. There is definitely either something wrong with the libraries, or something wrong with the sequencing. I would contact your sequencing company and discuss it with them.

ADD REPLY
1
Entering edit mode

Did you trim of adapters and polyA tails? Which sequencer? What is your read length?

ADD REPLY
0
Entering edit mode

I believe the company that gave us the raw reads did that for me already.

ADD REPLY

Login before adding your answer.

Traffic: 2531 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6