variant filtering based on high quality reference - removing false positives
2
0
Entering edit mode
3.9 years ago
nobody ▴ 10

Hello Wonderful people !!!

I'm new to the world of bio-informatics and this is my first post, so please forgive me for any unintended mistake ...

The VCF file that I'm working on are missing ##INFO variables like QD, FS, SOR, MQ, MQRankSum, ReadPosRankSUm, InbreedingCoeff ... may be one can generate them first and then do the variant quality control ...

but at this stage I just want to do some basic quality control on the variants in my VCF files ...

https://www.nist.gov/programs-projects/genome-bottle provides the high quality reference for variants calling ...

Is there a way by which I can select only the variants that are present in the reference VCF file?

I have used bcftools till now for normalization and to filter out samples from VCF file ..

may anyone please direct me on how to achieve the above task ? This help would be much appreciated ...

thanks team ...

sequence genome vcf quality control variant • 849 views
ADD COMMENT
1
Entering edit mode
3.9 years ago
4galaxy77 2.9k

Download the high quality reference and then print out all the SNPs.

bcftools view -v snps reference.vcf | bcftools query -f'%CHROM\t%POS\n'  > reference_positions.txt

Then extract these positions from the target vcf.

bcftools view -T reference_positions.txt target.vcf > target_filtered.vcf

If you also want to include indels, then you can add it into the -v argument under bcftools view on the first line.

ADD COMMENT
0
Entering edit mode
3.9 years ago
nobody ▴ 10

@4galaxy77 oh my wonderful friend ... much love and respect to you !!! may the force be with you ... forever and ever !!! I will try this approach and revert back if any further question .... much thanks !!!

ADD COMMENT

Login before adding your answer.

Traffic: 1778 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6