Hi. I used samtools rmdup to cancel PCR dup in sorted.bam of my data. But some of the genes in my data lost most of the align reads(see below) during this process, Is it right? And the more the reads aligned to a gene, the higher percitage of reads that be delete during rmdup. Why the command cancel so many reads. It seems impossible to find so many dup in my data.
before after
235132 15438
2410 1535
1740 1489
2926 2493
636 548
2666 2258
1866 1581
2390 2009
1040 885
8019 3467
1668 1418
2218 1928
2011 1730
4902 1924
120 103
14634 4432
25263 3206
1047 844
36094 4895
9222 6558
177499 19560
390835 25276
240 195
what type of experiment did you attempt? WGS, capture?
The following work is to call SNPs. Any comments?