Question

many read aligned to the same position but with different variants

0

Entering edit mode

4.1 years ago

yliueagle ▴ 290

(Please open this link if image not displayed: https://ibb.co/mHYH8NC

I have two questions related to the following alignment from a single sequencing sample of a cell line:

(1) are the reads in the bottom represent PCR duplicates, as they are aligned exactly to the same position (2) if they are duplicates, why there are so many different variants among them? (e.g,. at the position near 60795540

Thanks for your answer!

enter image description here

alignment duplicates reads • 855 views

ADD COMMENT • link updated 4.1 years ago by GenoMax 147k • written 4.1 years ago by yliueagle ▴ 290

0

Entering edit mode

are the reads in the bottom represent PCR duplicates, as they are aligned exactly to the same position

We don't see the full reads but there are too many differences in them just in this region to be PCR duplicates. You would normally have the same start/end with a defined number (small) of differences in them.

Run a tool like clumpify.sh if you really want to identify duplicates: A: Introducing Clumpify: Create 30% Smaller, Faster Gzipped Fastq Files

ADD REPLY • link 4.1 years ago by GenoMax 147k

0

Entering edit mode

Thanks for your answer. Here I updated the figure. These reads mapped exactly to the same region except that they have different variants, especially at the position near 60795540

ADD REPLY • link 4.1 years ago by yliueagle ▴ 290

0

Entering edit mode

These reads mapped exactly to the same region except that they have different variants

Then they don't quite fit the definition of PCR duplicates. Perhaps you are allowing too many errors when reads are originally aligned, which allows these reads to map here (even if they are not from this region). Is there any soft-clipping happening that we can't see in that image? If you want to identify PCR duplicates then use the clumpify method.

ADD REPLY • link 4.1 years ago by GenoMax 147k