Output bigger than input. VCFtools
0
0
Entering edit mode
5.1 years ago

Hello, I'm trying to extract a subset of SNPs using vcftools. I have a list of 2474008 SNPs and a 90 GB vcf file. I used this command:

vcftools --vcf GCF_000001405.25.vcf --snps rsLeptin_adj.txt --recode --recode-INFO-all --out match_rsLeptin_adj.txtBlockquote

But my output file has 2562258 lines (88250 more SNPs, apparently) , so I'm not sure if the command is not specific or if there is some error while processing that gives more lines. I have also tried with awk, using an array:

awk '{array[$1]}' rsLeptin_ad.txt

matching with the 3rd column of the vcf file, wich contains the SNPs

awk 'FNR==NR {array[$1]; next}; $3 in array' rsLeptin_adj.txt GCF_000001405.25.vcf

Has anyone experienced the same issue? Any comment will help. Thanks in advance

snp software error • 1.0k views
ADD COMMENT
0
Entering edit mode

If I understand correct, you have a file rsLeptin_adj.txt containing IDs that may be in the ID column of your vcf file and you like to filter out those variants.

For this use bcftools instead of vcftools. vcftools is deprecated.

bcftools view -i "ID=@rsLeptin_adj.txt" GCF_000001405.25.vcf > out.vcf

fin swimmer

ADD REPLY

Login before adding your answer.

Traffic: 1700 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6