I'm doing microbiome analysis where I'm looking for SNPs in a large number of microbe species' genomes. I ran my bcftools pipeline on around 15 bacterial and viral species from which the end result produced a number of variants ranging from 0-150 per vcf file.
From looking around it seems 10 is the common benchmark depth filter used in research but after I apply this filter to the vcf files I am left with only one species having 4 variants.
I ran mpileup with a max-depth of 8000 so to maximise the likelihood of finding significant variants.
Could there be something wrong with how I'm running the mpileup or is it due to the depth of sequencing/coverage in the first instance? Why is there such a low depth of reads per sample in the vcf files?
Any insights greatly appreciated! Thanks!
what is the average depth of/coverage for sequencing? Did you check the bam file, in regions of interest, for average coverage? See if you can fine tune parameters of calling variants. Check if trimming is good enough if there is a trimming step involved. Check alignment parameters and if you are using reference genomes, check if you are using correct genomes/references. Make sure that all the steps use same versions/builds of reference genomes/databases.