Low mapping rate with splice aware aligners (kallisto, HISAT2) but not bowtie2
1
0
Entering edit mode
4.6 years ago
a.palmer ▴ 20

Hi there,

Recently I've been processing paired-end mRNA-seq data from an experiment in C. elegans.

When I've tried aligning my reads to the reference transcriptome, both kallisto and HISAT2 return extremely low alignment rates (~0.3%).

kallisto code:

kallisto index -i transcriptome.idx Caenorhabditis_elegans.WBcel235.cdna.all.fa

kallisto quant -i transcriptome.idx -o out/S996-1 data/S996-1-R1.fastq data/S996-1-R2.fastq --fr-stranded

HISAT2 code:

hisat2-build Caenorhabditis_elegans.WBcel235.cdna.all.fa hisat_index

hisat2 -x hisat_index -1 S996-1-R1.fastq -2 S996-1-R2.fastq -S S996-1.sam

Strangely though, when using bowtie2 I achieved ~85% alignment.

bowtie2 code

bowtie2-build Caenorhabditis_elegans.WBcel235.cdna.all.fa bowtie_index

bowtie2 -x bowtie_index -1 S996-1-R1.fastq -2 S996-1-R2.fastq -S S996-1.sam

I don't understand what is causing this difference - my read quality is normal and I'm using default parameters.

Any help would be greatly appreciated!

Alex

RNA-Seq kallisto HISAT2 bowtie2 • 2.9k views
ADD COMMENT
2
Entering edit mode

Please add code. Anecdotal error descriptions are hard to debug. cDNA reference genome is unclear, there is a reference genome and a reference transcriptome. cDNA and genome are mutually exclusive. Please add all command lines including how indices were creates.

ADD REPLY
1
Entering edit mode

Hi, I've amended my post - hopefully it makes things clearer

ADD REPLY
0
Entering edit mode

Thanks, can you also post the alignment summary of hisat and bowtie (the one that is printed to screen when the alignment is finished where it tells how many reads mapped, concordantly, disconcordantly etc...)? Did you manipulate the fastq files before alignment (trimming, reordering, things like that)?

ADD REPLY
0
Entering edit mode

Sorry for the long response, the only modification to my files was to remove the sequencing adapters specific to each run (multiplexed sequencing)


Bowtie2 output:

24626899 reads; of these:
24597942 (99.88%) were paired; of these:
5060101 (20.57%) aligned concordantly 0 times
18743985 (76.20%) aligned concordantly exactly 1 time
793856 (3.23%) aligned concordantly >1 times
----
5060101 pairs aligned concordantly 0 times; of these:
452691 (8.95%) aligned discordantly 1 time
----
4607410 pairs aligned 0 times concordantly or discordantly; of these:
9214820 mates make up the pairs; of these:
7479978 (81.17%) aligned 0 times
1557955 (16.91%) aligned exactly 1 time
176887 (1.92%) aligned >1 times
28957 (0.12%) were unpaired; of these:
28919 (99.87%) aligned 0 times
35 (0.12%) aligned exactly 1 time
3 (0.01%) aligned >1 times
84.75% overall alignment rate


HISAT2 output:

24626899 reads; of these:
24626899 (100.00%) were paired; of these:
24565513 (99.75%) aligned concordantly 0 times
34258 (0.14%) aligned concordantly exactly 1 time
27128 (0.11%) aligned concordantly >1 times
----
24565513 pairs aligned concordantly 0 times; of these:
496 (0.00%) aligned discordantly 1 time
----
24565017 pairs aligned 0 times concordantly or discordantly; of these:
49130034 mates make up the pairs; of these:
49104613 (99.95%) aligned 0 times
13478 (0.03%) aligned exactly 1 time
11943 (0.02%) aligned >1 times
0.30% overall alignment rate

ADD REPLY
0
Entering edit mode
4.6 years ago
a.palmer ▴ 20

I solved the problem - it turns out that splice-aware aligners such as kallisto and HISAT2 require a genomic index, rather than one based entirely off the transcriptome. After I changed this, my overall alignment rate increased from 0.30% to ~75% using HISAT2.

ADD COMMENT
3
Entering edit mode

Kallisto in fact needs the transcriptome, not genome. I guess that using transcriptome reference but genome splice sites caused the trouble in hisat2, still kallisto should've been fine with the transcriptmome.

ADD REPLY

Login before adding your answer.

Traffic: 2131 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6