Entering edit mode
4.5 years ago
vinayjrao
▴
260
Hi,
I am analyzing whole exome data to analyze InDels and SNPs between healthy control and diseased patients. Since I'm new to analyzing exome data, I would like to know whether I should remove all duplicates or only sequencing duplicates, because removing all duplicates can also result in neglecting the genomic duplication events.
Thanks in advance.
If you have no experience I recommend using a standard and published workflow for exome data analysis rather than putting things together yourself. As inspiration e.g. https://github.com/gatk-workflows/gatk4-exome-analysis-pipeline