Hey!
I am working with whole genome sequencing data of bacterial samples and have done an Illumina for the same. So on receiving the .fastq files (paired-end), I have aligned them to my reference genome using bowtie2 to get a SAM file which I converted to BAM and sorted the bam file. I then filtered the sorted bam file to obtain one where phred score >30. This filtered bam file is what I have used for my downstreaming analysis.
My question is whether there is a difference in the final output file if I use MarkDuplicates by Picard? I read how MarkDuplicates by Picard works, where it recognizes optical artifacts and PCR duplicates by seeing a pair with Q>15 is what is considered. Hope I have got that right! So does that mean when I just do a Q>30, I have taken care of duplicates or are these two totally different quality checks?
Please advice if I can go ahead with the Q>30 filtered files or I need to use the Mark Duplicates tool also. Would be great if you could give me a simple explanation for the same.
Thanks in advance!! :)