Dear all,
Just to confirm: it is normal to have a different number of reads when aligning the same reads against two different genomes?
I think the answer is YES because the aligner generates different hits (reads) according to the reference genome. Also, the number of reads of the starting fastq file is different from that of the SAM file, thus SAM does not report a number that is tethered to the starting file.
For instance, I have:
x=$(zcat <file>_1.fq.gz | wc -l)
L=$(($x/4))
echo $L
1 190 389 447 # same number on the paired file
samtools flagstat <align_first_reference>.sam | grep 'total' | cut -d+ -f1
2 383 795 990
samtools flagstat <align_second_reference>.sam | grep 'total' | cut -d+ -f1
2 381 177 452
Thank you.
You also need to account for secondary alignments that are bound to be different as well.