Hi there,
Recently I've been processing paired-end mRNA-seq data from an experiment in C. elegans.
When I've tried aligning my reads to the reference transcriptome, both kallisto and HISAT2 return extremely low alignment rates (~0.3%).
kallisto code:
kallisto index -i transcriptome.idx Caenorhabditis_elegans.WBcel235.cdna.all.fa
kallisto quant -i transcriptome.idx -o out/S996-1 data/S996-1-R1.fastq data/S996-1-R2.fastq --fr-stranded
HISAT2 code:
hisat2-build Caenorhabditis_elegans.WBcel235.cdna.all.fa hisat_index
hisat2 -x hisat_index -1 S996-1-R1.fastq -2 S996-1-R2.fastq -S S996-1.sam
Strangely though, when using bowtie2 I achieved ~85% alignment.
bowtie2 code
bowtie2-build Caenorhabditis_elegans.WBcel235.cdna.all.fa bowtie_index
bowtie2 -x bowtie_index -1 S996-1-R1.fastq -2 S996-1-R2.fastq -S S996-1.sam
I don't understand what is causing this difference - my read quality is normal and I'm using default parameters.
Any help would be greatly appreciated!
Alex
Please add code. Anecdotal error descriptions are hard to debug. cDNA reference genome is unclear, there is a reference genome and a reference transcriptome. cDNA and genome are mutually exclusive. Please add all command lines including how indices were creates.
Hi, I've amended my post - hopefully it makes things clearer
Thanks, can you also post the alignment summary of hisat and bowtie (the one that is printed to screen when the alignment is finished where it tells how many reads mapped, concordantly, disconcordantly etc...)? Did you manipulate the fastq files before alignment (trimming, reordering, things like that)?
Sorry for the long response, the only modification to my files was to remove the sequencing adapters specific to each run (multiplexed sequencing)
Bowtie2 output:
24626899 reads; of these:
24597942 (99.88%) were paired; of these:
5060101 (20.57%) aligned concordantly 0 times
18743985 (76.20%) aligned concordantly exactly 1 time
793856 (3.23%) aligned concordantly >1 times
----
5060101 pairs aligned concordantly 0 times; of these:
452691 (8.95%) aligned discordantly 1 time
----
4607410 pairs aligned 0 times concordantly or discordantly; of these:
9214820 mates make up the pairs; of these:
7479978 (81.17%) aligned 0 times
1557955 (16.91%) aligned exactly 1 time
176887 (1.92%) aligned >1 times
28957 (0.12%) were unpaired; of these:
28919 (99.87%) aligned 0 times
35 (0.12%) aligned exactly 1 time
3 (0.01%) aligned >1 times
84.75% overall alignment rate
HISAT2 output:
24626899 reads; of these:
24626899 (100.00%) were paired; of these:
24565513 (99.75%) aligned concordantly 0 times
34258 (0.14%) aligned concordantly exactly 1 time
27128 (0.11%) aligned concordantly >1 times
----
24565513 pairs aligned concordantly 0 times; of these:
496 (0.00%) aligned discordantly 1 time
----
24565017 pairs aligned 0 times concordantly or discordantly; of these:
49130034 mates make up the pairs; of these:
49104613 (99.95%) aligned 0 times
13478 (0.03%) aligned exactly 1 time
11943 (0.02%) aligned >1 times
0.30% overall alignment rate