Hi all,
I'm working with shallow coverage genomic sequencing data. I'm trying to pull out unique variants from one of my samples, but from combing over the alignments on IGV, I can see that there are several indels and SNPs being left out from my .vcf with other very low quality/depth SNPs being included.
I can see the deletion in my filtered .bam file (see below), but it's lost once I use bcftools mpileup -- even with very low quality thresholds. I think this deletion is real because it's also a parent line in an F2 screen and so it is present in tens of other samples.
I'm new to bioinformatics and sequencing analysis, but I can see that there are older posts dealing with this from ~5-10 years ago with bcftools version 1.12, but I'm currently using version 1.2.
Am I doing anything obviously wrong? Thanks.
Additional info:
- BWA-MEM2 for alignment
- samtools view -b -q 20 -F 4 -F 256 -F 512 and markdup -r to generate my .bam file
- (stringent) bcftools mpileup -f "ref/path" -q 30 --min-BQ 20 -b "shortlist.txt" -o "testindel.bcf"
- (loose) bcftools mpileup -f "ref/path" -q 10 --min-BQ 5 -b "shortlist.txt" -o "testindel.bcf"
Snippet of read associated with deletion in .bam file (depth is ~17-20 on either side of deletion):
- NT_033778.4 18398702 60 18M7D132M = 18398744 MQ:i:60
Hi,
You were right! While bcftools was sufficient for QTL, I was able to take the parent lines and original lines they were derived from and use GATK4-- which was able to detect the indel specified and others I suspected existed based on the CIGAR string.
Thank you!