Question

Detecting heterozygous deletions in NGS sequence data

1

Entering edit mode

19 months ago

shpak.max ▴ 50

I am trying to determine the presence or absence of a deletion over a short (< 10 bp) region in 25 Drosophila melanogaster genomes for which I have whole-genome sequence data.

I ran the standard pipeline to map the reads to a reference genome and used GATK to genotype all sites, and I found that very few of my genomes had this (supposedly) common deletion. What we think is occurring is that in many cases the deletion is in the heterozygous state, so rather than genotyping NN..N, we have a identified bases at the loci of interest in the vcf.

My question is what would be the best way to identify the deletion either from the vcf or (going further up the pipeline) from the reads in the fastq files.

For the vcf, my initial thought was this: all being equal, the read depth at sites heterozygous for the deletion should be approximately 1/2 that for adjacent sites flanking the deletion. So as a heuristic, I could check read depth at the loci of interest and compare them to neighboring sites. However, this approach isn't particularly robust due to high variance in read depth among adjacent sites.

So I'm trying to figure out how to glean this information from the reads themselves, and would appreciate any suggestions.

fastq vcf GATK • 960 views

ADD COMMENT • link 19 months ago by shpak.max ▴ 50

score 0 · Answer 1 · 2023-04-09

0

Entering edit mode

19 months ago

swbarnes2 14k

I don't think GATK should have a problem with small heterozygous deletions. Grep the fastq if you want to be sure.

ADD COMMENT • link 19 months ago by swbarnes2 14k

score 0 · Answer 2 · 2023-04-09

0

Entering edit mode

19 months ago

cmdcolin ★ 4.0k

some additional signals that may augment a pure-read depth based strategy would be detecting CIGAR-deletions that span the 10bp region of interest, and/or soft-clipping at either end of the 10bp region of interest. detecting CIGAR-deletions that span the 10bp region would be somewaht similar to grepping the FASTQ like the other commenter mentions

ADD COMMENT • link 19 months ago by cmdcolin ★ 4.0k

0

Entering edit mode

Someone suggested using IGV with the bam files, but I agree that searching the fastq files may be more efficient.

ADD REPLY • link 19 months ago by shpak.max ▴ 50