Hi,
I am wondering is there a way to extract / print those variants from a vcf file whose distance is not more than 5 bps apart?
Regards,
Waqas.
Hi,
I am wondering is there a way to extract / print those variants from a vcf file whose distance is not more than 5 bps apart?
Regards,
Waqas.
If I'm not wrong, you can flag the close SNPs with: https://software.broadinstitute.org/gatk/documentation/tooldocs/current/org_broadinstitute_gatk_tools_walkers_filters_VariantFiltration.php
java -jar /commun/data/packages/gatk/3.7.0/GenomeAnalysisTK.jar -T VariantFiltration -R ref.fasta -V input.vcf --clusterSize 2 --clusterWindowSize 5
this will add 'SnpCluster' in the FILTER column.
Yes, you can do it in many ways (python, command line, perl). You just have to ask for this condition to be verified:
for each line, print line if (line_position - previous_line_position) <= 5
The position field in the VCF file is the 2nd :) https://samtools.github.io/hts-specs/VCFv4.2.pdf
I guess you can do it with bedtools cluster in two step. http://bedtools.readthedocs.io/en/latest/content/tools/cluster.html?highlight=cluster
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Yes, Pierre GATK's VariantFiltration worked for me. I wanted the exact same thing...,,,!!!!
Big Thanks...,,,,!!!!
Cheers,
Waqas.