Hi Biostars, I am reanalyzing some RNA-seq data and would like to identify reads with the longest deletions. I have extracted the CIGAR strings containing deletions from column 6 of my SAM files. However, I have a huge list of reads and I don't want to count the number of deletions by eyeballing through this huge list. I know I need an awk command to filter things here, but for now I can't put together a complex one for this. I would be grateful for any assistance.
samtools view sample.sam | cut -f 6 | grep "D" | head -3
12M1D89M
12M1D89M
13M1D88M
Thank you very much. That was really quick and timely.