Input and output of picard markduplicate for deepvariant
0
0
Entering edit mode
2.9 years ago
amy__ ▴ 220

Hi, I'm currently doing a WES pipeline to identify variants in human sequences, currently using (in order of use):

  1. Read QC and trimming: fastq

  2. Alignment: bwa index, bwa mem, samtools view, samtools sort, and samtools index.

  3. Remove PCR duplicates: picard markduplicates?

When it comes to removing PCR duplicates, I have seen that picard's markduplicate works to identify any duplicates.

java -jar MarkDuplicates.jar I=PE_samtoolssorted.bam
O=markedduplicates.bam M=markedduplicatesmetrics.txt

However when it comes to removing the PCR duplicates that are found online that just adding REMOVE_DUPLICATES=true removes them?

java -jar MarkDuplicates.jar I=PE_samtoolssorted.bam O=removedduplicates.bam M=markedduplicatesmetrics.txt
REMOVE_DUPLICATES=true 

The output of this will be a sorted bam file with the removed PCR duplicates?

Would the input for a variant caller like deepvariant, which requires a sorted bam file be this removedduplicates.bam file?

and if so, would it be this removedduplicatessorted.bam file that needs indexing for input into deepvariant rather than the original PE_samtoolssorted.bam?

Thanks! Sorry if confusing. Amy

markduplicates picard bam sam deepvariant • 934 views
ADD COMMENT
1
Entering edit mode

According to the documentation, yes, REMOVE_DUPLICATES=true should output an alignment file in which the duplicate reads have been removed. You will then likely need to sort the output and index the sorted file. The sorted file (which doesn't have any duplicate reads) would then be used for further downstream analyses (assuming that that is the appropriate input file for the steps you want to perform).

ADD REPLY

Login before adding your answer.

Traffic: 2123 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6