I have used Bowtie 2 to align my reads to reference genome and Samtools to call variances (SNPs and InDels). Does anyone know how to count how many SNPs and InDels I got? Thanks!
I have used Bowtie 2 to align my reads to reference genome and Samtools to call variances (SNPs and InDels). Does anyone know how to count how many SNPs and InDels I got? Thanks!
You could write a quick awk line ignoring multi-allelic loci:
SNPs:
awk '! /\#/' variants.vcf | awk '{if(length($4) == 1 && length($5) == 1) print}' | wc -l
Indels:
awk '! /\#/' variants.vcf | awk '{if(length($4) > 1 || length($5) > 1) print}' | wc -l
but for something more comprehensive, use bcftools stats.
Via BEDOPS convert2bed
:
$ vcf2bed --snvs < foo.vcf | wc -l
$ vcf2bed --insertions < foo.vcf | wc -l
$ vcf2bed --deletions < foo.vcf | wc -l
You can also use the summary from bcftools stats
to answer this question.
Hi Charles,
I crossed your post about deepvariant caller and think you can advise me on a bioinformatic pipeline for analyzing genetic data. We generated DNA sequencing by using TrueSight Cancer 94, Illumina, and were not sure how we should analyze them, from aligment to variant calling to annotate to variant classification, and how filter variant. I would appreciate if you can help me. Thanks for your time and consideration!
Chinh
Hi - I think you meant this as a comment for my other answer.
Also, I think you might be referring to this blog post:
http://cdwscience.blogspot.com/2019/05/precisionfda-and-custom-scripts-for.html
My sense is that it helps to be able to visually inspect variation calls, and I am not sure I would be comfortable with a result if you couldn't.
I have some sense that you might need to take some extra caution with targeted gene panels, which I think is a slightly different blog post (even though it is relevant to the idea that I thought the default set of DeepVariant calls could benefit from some additional filtering, kind of like the unfiltered GATK calls):
http://cdwscience.blogspot.com/2019/06/general-comments-on-low-frequency.html
However, to be honest, I don't have specific advice for the TrueSight Cancer 94 gene panel.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
in a BAM or in a VCF ?