Detecting heterozygous deletions in NGS sequence data
2
1
Entering edit mode
19 months ago
shpak.max ▴ 50

I am trying to determine the presence or absence of a deletion over a short (< 10 bp) region in 25 Drosophila melanogaster genomes for which I have whole-genome sequence data.

I ran the standard pipeline to map the reads to a reference genome and used GATK to genotype all sites, and I found that very few of my genomes had this (supposedly) common deletion. What we think is occurring is that in many cases the deletion is in the heterozygous state, so rather than genotyping NN..N, we have a identified bases at the loci of interest in the vcf.

My question is what would be the best way to identify the deletion either from the vcf or (going further up the pipeline) from the reads in the fastq files.

For the vcf, my initial thought was this: all being equal, the read depth at sites heterozygous for the deletion should be approximately 1/2 that for adjacent sites flanking the deletion. So as a heuristic, I could check read depth at the loci of interest and compare them to neighboring sites. However, this approach isn't particularly robust due to high variance in read depth among adjacent sites.

So I'm trying to figure out how to glean this information from the reads themselves, and would appreciate any suggestions.

fastq vcf GATK • 966 views
ADD COMMENT
0
Entering edit mode
19 months ago

I don't think GATK should have a problem with small heterozygous deletions. Grep the fastq if you want to be sure.

ADD COMMENT
0
Entering edit mode
19 months ago
cmdcolin ★ 4.0k

some additional signals that may augment a pure-read depth based strategy would be detecting CIGAR-deletions that span the 10bp region of interest, and/or soft-clipping at either end of the 10bp region of interest. detecting CIGAR-deletions that span the 10bp region would be somewaht similar to grepping the FASTQ like the other commenter mentions

ADD COMMENT
0
Entering edit mode

Someone suggested using IGV with the bam files, but I agree that searching the fastq files may be more efficient.

ADD REPLY

Login before adding your answer.

Traffic: 2389 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6