Hi There,
A while ago I have done a training on NextGen analysis. Recently I am trying to do the same analysis for the same Genome sequence on a different machine but now when i compare the flagstat results from samtools of the merged files they are different.
Methods: I used BWA for the alignment -q 20 to get .sai files > then convert these to .sam files using bwa sampe > later i convert these to .bam files using samtools /samtools view -bS | /samtools sort -. So my question is what went wrong ? is it normal that i get different results each time i do an alignment ? if so why I am getting a less properly paired in my recent analysis.
(Recent analysis) of the final merged .bam files samtools flagstat:
253230842 + 0 in total (QC-passed reads + QC-failed reads)
493185 + 0 duplicates
236903690 + 0 mapped (93.55%:nan%)
253230842 + 0 paired in sequencing
126615421 + 0 read1
126615421 + 0 read2
232042648 + 0 properly paired (91.63%:nan%)
234896807 + 0 with itself and mate mapped
2006883 + 0 singletons (0.79%:nan%)
1946513 + 0 with mate mapped to a different chr
778386 + 0 with mate mapped to a different chr (mapQ>=5)
While doing the training samtools results of the same genome sequence
335964388 + 0 in total (QC-passed reads + QC-failed reads) 0 + 0 duplicates
313032349 + 0 mapped (93.17%:nan%)
335964388 + 0 paired in sequencing
167982194 + 0 read1
167982194 + 0 read2
305304034 + 0 properly paired (90.87%:nan%)
309497738 + 0 with itself and mate mapped
3534611 + 0 singletons (1.05%:nan%)
2984962 + 0 with mate mapped to a different chr
1131183 + 0 with mate mapped to a different chr (mapQ>=5)
Thanks a lot in advance
Please edit your question to make it shorter. It is way too long and unreadable. Also please only ask one question per post.
Sorry about the lenghty post and deleted the second question
Please post both command lines that created the two sam files. There is a good chance that you have generated different alignments either based on different builds or different inputs.
Hi Istvan and thanks for the editing. The top flagstat result were generated from a script that the instructor provided back then (I do not have it now since it's their script) that been said I wrote the steps on notes and followed the bwa manual. Regarding the input I brought my own data to work on using external harddisk (zipped files). We did download the same tools on my machine and the machine provided back then. However, I am not sure what build was the script pointing at (it might be different).
I meant the bottom results were generated from a script they provided to speed up the process of generating the results but during the training I used the same pipeline I was trained on.