I am starting to work on long-read mtDNA sequencing and I am facing an issue. Some of the reads I aligned with minimap2 show a large deletion of about 1000 bp on IGV. My goal is to filter out reads with D > 1000 on the CIGAR string. However, when using samtools view commands, I cannot see any CIGAR string with D1000 or similar, which is also confirmed by sequences where there are no deletions. How is this possible? Perhaps minimap2 splits large deletions into smaller ones for computational reasons? Or are there alignment issues? Below are the commands. Thank you for the help and explanations you will provide.
samtools view mt8892L_merged_mapped_chrM_sorted.bam | cut -f 6 | grep -P '[D>1000]' | head
Thank you very much for your advice. So, according to the samjkd syntax, am I extracting reads that have a total length of D greater than 1000? With that Perl command, I was trying to adapt my request from this:
samtools view -F 0x4 input.sort.bam | cut -f 6 | grep -P '[ID]' | head
. Perhaps I am mistakenno, you're excluding. See
noneMatch
ok thanks, I just wanted to extract instead, is this command ok?
no, it would be:
thank you so much!