Entering edit mode
5.8 years ago
RNAseqer
▴
280
Hello everyone,
I have just started using bcftools to 'prune' some vcf files. While I have found some helpful examples of how to discard SNPs with high LD:
bcftools +prune -l 0.6 -w 1000 frag.vcf -Ov -o output1.vcf
I was hoping to actually create output where the SNPs kept were those with r2 values higher than .6, and the other SNPs are discarded. Is there a straightforward way to do this?
If the functionality is not directly built into
bcftools +prune
, then I would, for example, compare the lists of SNPs in the filtered versus unfiltered, and then infer the ones that were removed.bcftools query
can output VCF-formatted data in a neat way, and you could then useawk
arrays to compare the lists.I was thinking along the same lines. I think that would work. However, I did find vcftools has a command line option for minimum r2:
This outputs a file containing an r2 value rather than the vcf file data line... but I'm thinking it may be most efficient to just pull out these SNPs using a custom perl script that takes the vcftools output as its input and pulls lines from the original vcf file accordingly. Also, I am just starting to look at the Tagger program in the Broad's Haploview software package, since I am really interested in getting tagging SNPs alone...