Hello,
I've created a small reference fasta file that consists of combined sequences of two exons from two different genes where I know the fusion has occurred. I've uploaded that reference file into Galaxy as well as a fastq file and ran the tool with all default setting. The alignment worked perfectly.
I'm trying to reproduce the same on the command line. I've downloaded and installed HISAT2 version 2.0.5. I'm trying to reproduce the same alignment that was achieved using Galaxy.
Here are the steps that I followed:
Indexed the reference with HISAT2
hisat2-build /data/HISAT2/BAG4_ref.fasta BAG4_ref_indexed
Performed the alinement
hisat2 -x /data/HISAT2/index/BAG4_ref_indexed -U /data/HISAT2/IonXpress_011.fq -S /data/HISAT2/Bag4.sam samtools view -bS Bag4.sam > Bag4.bam
Here are the stats:
345063 reads; of these:
345063 (100.00%) were unpaired; of these:
344481 (99.83%) aligned 0 times
578 (0.17%) aligned exactly 1 time
4 (0.00%) aligned >1 times
0.17% overall alignment rate
This command-line method produces a much smaller bam file 5,091K vs 19,349 K (Galaxy).
What am I doing wrong. Please help to diagnose the problem.
Thanks
Click on the information icon (an "(i)") on the history item and see if the exact command that was used is included. I try to make that available on the instances that I administer, perhaps others do as well.
@Devon. When I click on "i" Dataset information, Job information and Tool parameters are displayed. I don't see any commands. Here what is says for Tool version.
Don't go on the size of the file alone. That is never is a good
statistic
for file comparison. Do you have a similar summary stat for Galaxy's alignment as one posted above?I ran: samtools flagstat glaxy.bam and it looks like the stats are comparable:
I don't understand why there is such a difference in size for two bam files that are results of exactly the same alignment. Thanks
It is possible that the unmapped reads are being written to the output file on galaxy but in your alignment they are not.
The alignment numbers don't look that different
That is poor alignment in both cases BTW.