Extract SNPs within 5 bps distance apart
3
0
Entering edit mode
7.7 years ago
waqasnayab ▴ 250

Hi,

I am wondering is there a way to extract / print those variants from a vcf file whose distance is not more than 5 bps apart?

Regards,

Waqas.

SNP R next-gen vcf • 2.8k views
ADD COMMENT
3
Entering edit mode
7.7 years ago

If I'm not wrong, you can flag the close SNPs with: https://software.broadinstitute.org/gatk/documentation/tooldocs/current/org_broadinstitute_gatk_tools_walkers_filters_VariantFiltration.php

java -jar /commun/data/packages/gatk/3.7.0/GenomeAnalysisTK.jar -T VariantFiltration -R ref.fasta -V input.vcf --clusterSize 2 --clusterWindowSize  5

this will add 'SnpCluster' in the FILTER column.

ADD COMMENT
1
Entering edit mode

Yes, Pierre GATK's VariantFiltration worked for me. I wanted the exact same thing...,,,!!!!

Big Thanks...,,,,!!!!

Cheers,

Waqas.

ADD REPLY
1
Entering edit mode
7.7 years ago

Yes, you can do it in many ways (python, command line, perl). You just have to ask for this condition to be verified:

for each line, print line if (line_position - previous_line_position) <= 5

The position field in the VCF file is the 2nd :) https://samtools.github.io/hts-specs/VCFv4.2.pdf

ADD COMMENT
0
Entering edit mode

yeah, chromosomal positions (second column) is my target. I searched on google but I failed. Is there a way, to do the same in awk?

Thanks,

Waqas.

ADD REPLY
0
Entering edit mode

yes, but it's more complicated with indels/multi-allelic sites + check the chromosome is the same.

ADD REPLY
0
Entering edit mode
7.7 years ago
sacha ★ 2.4k

I guess you can do it with bedtools cluster in two step. http://bedtools.readthedocs.io/en/latest/content/tools/cluster.html?highlight=cluster

ADD COMMENT

Login before adding your answer.

Traffic: 2633 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6