Is it possible to extract the list of all SNP (save as a .txt) of a VCF file? I do not want to convert it into .bed, and check the .bim, because the VCFs I have are large, and I have many of them.
Is it possible to extract the list of all SNP (save as a .txt) of a VCF file? I do not want to convert it into .bed, and check the .bim, because the VCFs I have are large, and I have many of them.
I want to the chr, pos, and rsid
grep -v "^##" input.vcf | cut -f1-3
EDIT 2021: bcftools query -f '%CHROM\t%POS\t%ID\n' input.vcf
The vcf2bed
binary uses Unix streams, so it will be about as fast as any extraction gets:
$ vcf2bed < snps.vcf | cut -f4 > ids.txt
There are --insertions
, --deletions
and --snvs
options to get subsets of the input VCF. See vcf2bed --help
for more detail.
Combine with the command suggested by ATPoint, I used the below command to extract the list, extracting chr, pos and rsid.
vcf2bed < foo.vcf | awk '{if (length($4) == 1 && length($5) == 1 || $1 ~ /^#/) print $0}'| cut -f1,3,4 > ids.txt
But when I use my VCFs, I got an error as below. I tried it with a few VCF I have, it is always line 164, but the numbers of Segmentation fault are different
line 164: 46368 Segmentation fault ${cmd} ${options} - 0<&0
As characteristic for a SNP, one nucleotide is exchanged by another. Therefore, unlike InDel where gain or lost takes place, the length of the REF and ALT column must be 1. Using that to discriminate from Indel with awk:
# || $1 ~ /^#/ makes sure that the header and column names are printed:
awk '{if (length($4) == 1 && length($5) == 1 || $1 ~ /^#/) print $0}' foo.vcf
Instead of printing $0, you can specify any information you want from the current line and append (>>) it to a new file.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Hello,
what exactly do you like to extract? Just the ID of a variant if there is one? Just the position and the REF/ALT?
fin swimmer
I want to the chr, pos, and rsid.
ouptut:
Try vcflib vcf2tsv function to convert vcf to tab separate file and you can extract any information you want.