I am facing an interesting case, where bbtools callvariant fail to call SNPs in high density regions. I have some nanopore reads, and Clair3 telling me those regions do exist, as exemplified in the pic hereafter. I would like to tweak the callvariant to be able to detect those SNPs, as I have a lot of Illumina data and not so much in the realm of long reads. I know Brian Bushnell sometimes answers here. But anyone with any suggestion is welcome. As you see the above band using Clair3 identifies the SNPs, while they are rejected by the short read caller. Any idea?
Hmmm... usually reads with lots of variants in a region are mismapped as a result of a structural variations, so CallVariants penalizes them... in this case they also have low AF (unless this is a diploid). You can add the flag "countnearbyvars=f" to disable this. Posting the exact VCF lines generated for these variations, which you can generate with the "clearfilters" flag, can indicate exactly which filters they are failing. If this is a haploid you can also play with the rarity flag.
You can use "maxnearbycount", "penalizenearby", "nearbydist", and "flagnearby" for more fine-grained control of nearby variations. The defaults are:
public int maxNearbyCount=1;//Max nearby allowed before being flagged or failed
public int nearbyDist=20;
public boolean flagNearby=false;
public boolean failNearby=false;
public boolean penalizeNearby=false;
public boolean countNearbyVars=true;
Thanks for your answer, this is a haploid assembly of a diploid organism, not sure if you meant the organism ploidy or the assembly ploidy. So I am gonna try your suggestion and hunt the faulty flags. Thanks a lot. By the way, the speed at which your tool works to call the variants is a-m-a-z-i-n-g.
Brian, I have another question. I have a phase diploid assembly and want to map it against a diploid reference. Then, I want to call SNPs. Do you think callvariants.sh would work here? The thing is, there will be a coverage of 1X (assuming no repeat issues, etc, I am assuming the perfect mapping with a cardinality of 1). Thanks again.
If you are going to align an assembly to a reference, that will work fine, but you need to use the clearfilters flag since you want everything to show up. Personally I never have worked with a diploid assembly, just haploidified assemblies... I'm not sure how well aligning a diploid assembly to a diploid reference would work since you'll probably get 2x coverage in some places and 0x in others, but it's worth trying.
Thanks for your answer, this is a haploid assembly of a diploid organism, not sure if you meant the organism ploidy or the assembly ploidy. So I am gonna try your suggestion and hunt the faulty flags. Thanks a lot. By the way, the speed at which your tool works to call the variants is a-m-a-z-i-n-g.
Brian, I have another question. I have a phase diploid assembly and want to map it against a diploid reference. Then, I want to call SNPs. Do you think callvariants.sh would work here? The thing is, there will be a coverage of 1X (assuming no repeat issues, etc, I am assuming the perfect mapping with a cardinality of 1). Thanks again.
If you are going to align an assembly to a reference, that will work fine, but you need to use the clearfilters flag since you want everything to show up. Personally I never have worked with a diploid assembly, just haploidified assemblies... I'm not sure how well aligning a diploid assembly to a diploid reference would work since you'll probably get 2x coverage in some places and 0x in others, but it's worth trying.