Question

Removing duplicates

0

Entering edit mode

11 months ago

quentinperriere • 0

Hi,

For long-read technologies like Oxford Nanopore, do I need to remove duplicates after using minimap2 ? or i should keep them ?

(the bam files generated using minimap2 are used by freebayes in order to detect variants)

duplicates ONT minimap2 • 1.9k views

ADD COMMENT • link updated 10 months ago by noodle ▴ 640 • written 11 months ago by quentinperriere • 0

0

Entering edit mode

I guess the enlightening question is: duplicates of what?

ADD REPLY • link 11 months ago by samuel.a.odonnell ▴ 590

0

Entering edit mode

pcr duplicates or/and read duplicates should I use this command to remove them ? or we don't talk about duplicates when dealing with ONT ?? Sorry but I'm lost , it's a new notion for me samtools markdup -r -@ [number_of_thread] [input_sorted_bam] [output_dedup_bam]

ADD REPLY • link 11 months ago by quentinperriere • 0

0

Entering edit mode

With ONT you will only have PCR duplicates. 'Read' (you mean optical?) duplicates are an artifact of cluster-based sequencing (ie Illumina) and you won't have them with ONT.

Whether or not to remove duplicates at the read level (fastq) or alignment level (bam) depends on what you're trying to do, and how the library was constructed.

ADD REPLY • link 11 months ago by noodle ▴ 640

1

Entering edit mode

As above, you do not need to worry about duplicates for ONT data

Also have you looked at using other tools for variant calling more suited to ONT data? such as medaka and longshot?

ADD REPLY • link 10 months ago by samuel.a.odonnell ▴ 590

1

Entering edit mode

I could imagine an ONT scenario where you'd want to remove duplicates (amplicon-seq, etc) but likely for variant calling there is no need...especially if the library was PCR-free or low-cycle PCR as is common with ONT.

If the 'freebayes' program requires a duplicate flag be present, it might not be necessary to perform duplicate marking but just add the flag manually/synthetically ...depends on the library and what you expect.

ADD REPLY • link 10 months ago by noodle ▴ 640

0

Entering edit mode

thank you for responding. I was looking for suitable variant calling tool. I'm working on fungus and I don't have fast5 files. Could you recommand suitable tools for this situation please

ADD REPLY • link 10 months ago by quentinperriere • 0

0

Entering edit mode

I'd try those OP samuel.a.odonnell recommends

My guess is you're new to ONT, fast5 is the 'raw data' which is current over time...other than generating the fastq files these are not important for most people. You can re-generate fastq files from a bam file.

Regarding the technicalities of duplicate removal (not that it's likely needed here), depending specifically on your experiment and library you might want to remove at the level of the fastq file or the bam file. For example, if you have amplicon sequencing and want to be very strict about unique reads and you have a UMI, you would remove duplicates at the level of the fastq file. If you lack a UMI and did PCR in the creation of a library, it might make more sense to remove reads at the level of the bam/alignment. This is because two reads may align identically across a span in the genome but have slight variation due to PCR errors. Because they align to an identical spot they can be assumed to be duplicates, but if you tried to remove at the level of the fastq they would be unique reads because of a difference of a SNP/indel.

ADD REPLY • link 10 months ago by noodle ▴ 640