Entering edit mode
9.2 years ago
liuyifan2014
▴
110
Hi everyone, do you usually filter the duplicates during the pretreatment of your metagenomic raw data? Since some organisms contain multiple sets of the same gene(like 16s), I am afraid it will lose some information after removal of the duplicates.
Best
As you have correctly identified, there are arguments for and against removing dupes in metagenomic sequencing. Whatever path you choose, you're going to hear all about the other path from reviewers, lab members, etc. Best advice is to just look at both.
I would personally filter in/out properly-mapped reads too. Depending on the tools you use, this shouldn't even increase the processing time significantly :) And no need to treat them all the same after filtering either - if the insert length without duplicates is different to with duplicates, fine. You're doing different views of the same data, not different experiments, so its OK that their analysis is slightly different as it isn't about a comparison between the different views after all.