A year ago I started my bioinformatics learning from scratch and 0 background with unix systems. By then, I performed some alignment with tophat2 and rna-seq data from Mus musculus heart.
Two or three months ago, by accident I erase all my home in my cluster account (did a very stupid move, creating a file named $HOME
, and you can imagine what happened).
I had the backup only of the raw reads, and not from alignments, all data files containing what I did was erased also, and I had nothing on physical paper. The only thing that survived was the samstat outputs from the alignments.
I got angry but not, super angry, because would took only 16 hours to align everything again, but then, I started to become pretty confused, because the output was different, as you can see on the too samstat statistical output file, the MAPQ were different now.
Using the most recent genome or the same I used makes no difference.
I am not sure the exactly tophat command line I used, but pretty much I am using the same thing, with 0 mismatches, the same Gtf file, using --coverage
search, the same genome and everything else on default.
but what call my attention was the strange samstat output in the Base Quality Distributions.(EDIT: As you can see on the first plot, my first alignment got 95% with MAPQ > 30 in samstat, but only 85% on the second)
In the second plot, the (Base quality distributions), are the same for all bases on the first alignment, and completely different from the second.)
These two samstat output were generated using the same accepted_hits.bam.
Anyone has any idea about what can be happening?
[1] old x new alignment samstat output
http://s28.postimg.org/wunusnl71/Untitled.png
[2] old x new alignment samstat output
http://s16.postimg.org/rqs9id1cl/Untitled2.png
Thank you
you have two plots generated with different tools neither of which os particularly informative plus they display different things.
you need to spend more time formulating your question to be other than "strange output" what does that even mean?
I made an edit on the post trying to be more clear. The two plots were generated using samstat, on the same accepted_hits.bam. I am just trying to figure out why all bases has the same MAPQ number on the second plot. I need to reproduce this error so I can move on.