Entering edit mode
4.1 years ago
yliueagle
▴
290
(Please open this link if image not displayed: https://ibb.co/mHYH8NC
I have two questions related to the following alignment from a single sequencing sample of a cell line:
(1) are the reads in the bottom represent PCR duplicates, as they are aligned exactly to the same position (2) if they are duplicates, why there are so many different variants among them? (e.g,. at the position near 60795540
Thanks for your answer!
We don't see the full reads but there are too many differences in them just in this region to be PCR duplicates. You would normally have the same start/end with a defined number (small) of differences in them.
Run a tool like
clumpify.sh
if you really want to identify duplicates: A: Introducing Clumpify: Create 30% Smaller, Faster Gzipped Fastq FilesThanks for your answer. Here I updated the figure. These reads mapped exactly to the same region except that they have different variants, especially at the position near 60795540
Then they don't quite fit the definition of PCR duplicates. Perhaps you are allowing too many errors when reads are originally aligned, which allows these reads to map here (even if they are not from this region). Is there any soft-clipping happening that we can't see in that image? If you want to identify PCR duplicates then use the
clumpify
method.