Shall we move duplicated reads from metagenomic data?

1

Entering edit mode

9.2 years ago

liuyifan2014 ▴ 110

Hi everyone, do you usually filter the duplicates during the pretreatment of your metagenomic raw data? Since some organisms contain multiple sets of the same gene(like 16s), I am afraid it will lose some information after removal of the duplicates.

Best

sequencing next-gen gene PRINSEQ duplicates • 2.9k views

ADD COMMENT • link 9.2 years ago by liuyifan2014 ▴ 110

0

Entering edit mode

As you have correctly identified, there are arguments for and against removing dupes in metagenomic sequencing. Whatever path you choose, you're going to hear all about the other path from reviewers, lab members, etc. Best advice is to just look at both.
I would personally filter in/out properly-mapped reads too. Depending on the tools you use, this shouldn't even increase the processing time significantly :) And no need to treat them all the same after filtering either - if the insert length without duplicates is different to with duplicates, fine. You're doing different views of the same data, not different experiments, so its OK that their analysis is slightly different as it isn't about a comparison between the different views after all.

ADD REPLY • link 9.2 years ago by John 13k

Login before adding your answer.