Dear All,
I am trying to find the PCR duplicate reads from the bam/sam. If we use picard we can mark duplicates in the bam/sam file.
How to see the marked duplicates in the bam file. I tried checking the sam flag "1024" which decodes to "read is PCR or optical duplicate".
$ samtools flagstat Sample_WES01/WES01.clean.dedup.recal.bam
71753231 + 0 in total (QC-passed reads + QC-failed reads)
12384215 + 0 duplicates
71185962 + 0 mapped (99.21%:-nan%)
71753231 + 0 paired in sequencing
35881695 + 0 read1
35871536 + 0 read2
70159253 + 0 properly paired (97.78%:-nan%)
70654767 + 0 with itself and mate mapped
531195 + 0 singletons (0.74%:-nan%)
425872 + 0 with mate mapped to a different car
287474 + 0 with mate mapped to a different chr (mapQ>=5)
$
I extracted the flag column from bam file and tried grep'ing "1024". I couldn't see any matches.
Will I be able to see duplicate reads in IGV?
Awesome Pierre :).
I tried "samtools view -f 1024 in.bam". Then I extracted the unique flag from the bam. I got the following 16 flags.
"1089,1097,1105,1107,1121,1123,1137,1145,1153,1161,1169,1171,1185,1187,1201,1209"
I checked all the above flags and they are tagged to "read is PCR or optical duplicate" in addition to other property.
HISEQ:137:C6W39ACXX:7:1314:15234:3404 1123 chrM 1 15 57S44M = 41 141 TCAGGGCCATAAAG HISEQ:137:C6W39ACXX:7:1314:15234:3404 1171 chrM 41 60 101M = 1 -141 CTCCATGCATTTGGT HISEQ:137:C6W39ACXX:7:2102:20584:10431 1187 chrM 1 60 11S90M = 112 212 ACATCACGATGGATCA HISEQ:137:C6W39ACXX:7:1113:11949:62990 1209 chrM 10 60 101M = 10 0 TCTATCACCCTATTAAC HISEQ:137:C6W39ACXX:7:2311:11970:3501 1169 chrM 15 60 101M = 16193 16079 CACCCTATTAACCAC