Keeping only common variants in the merged VCF file
2
0
Entering edit mode
5.6 years ago
seta ★ 1.9k

Hi all,

After merging my vcf file containing specific variants with those variants in 1000 genome vcf, the ID column of merged VCF file is like below:

chr1:39440410:SG

rs6722104

rs60323161;chr1:39244787:SG

which only the rs60323161;chr1:39244787:SG are common variants. Please kindly let me know how can keep only common variants in the merged vcf file?

I used bcftools view -T for keeping just common variants, but it didn't work well; actually, the variants like below is still exist in the file, which chr1:39448418:SG should be removed

rs3118014;chr1:39448418:SG

chr1:39448418:SG

I also tested grep -Fwvf and grep -vf for removing those variants, but none of them works well. Please kindly share me your solution?

Thanks

VCF merge bcftools • 2.8k views
ADD COMMENT
1
Entering edit mode
5.6 years ago
husensofteng ▴ 410

I am not sure if I understand the question correctly, but it sounds as a line filtering issue to me. So:

awk '$1~"#" || ($3~"rs" && $3~"chr")' inputfile > outputfile

*Only keep lines that start with # (header lines) or there is rs ID and chr info at the third column of the file.

ADD COMMENT
0
Entering edit mode

Many thank for your nice solution.

ADD REPLY
0
Entering edit mode
5.6 years ago

Two options:

1) use BEDtools 'intersect' for the two original VCFs.

2) use VCFtools 'vcf-annotate' to add the 1000 Genomes rs numbers, then 'grep' to keep the variants that were annotated as such.

ADD COMMENT

Login before adding your answer.

Traffic: 2044 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6