Using VCFtools to obtain fasta files
1
2
Entering edit mode
8.4 years ago
severalorks ▴ 110

I would like to take a vcf file and a reference genome from the 1000Genomes project, and obtain a fasta file that lists the genomes for each individual in the vcf, according to the SNPs each individual has in the vcf file. I was wondering if VCFtools is able to do this? If not, what tools are available that can accomplish this?

I have written a python script that goes through the 84 million SNPs in the file and outputs a fasta file. I've tested it by running it on 10000 SNPs and it gives an output after several hours. However, I've tried running it for 84 million SNPs and it has been running for several days now. I'm looking for a more efficient way to obtain a fasta file from .vcf.

I am looking to skip indels.

EDIT: VCFtool's vcf-to-tab converts a .vcf file into a tab file, and then there's a script that turns tab into a fasta file. https://code.google.com/archive/p/vcf-tab-to-fasta/

genome fasta vcf 1000genomes vcftools • 5.7k views
ADD COMMENT
1
0
Entering edit mode

I believe that's what I'm looking for, I'll look into it

ADD REPLY
0
Entering edit mode

I looked into it and it works well for obtaining the alternate genome, but I'm looking for the sequences for each individuals in the vcf file. For example, the vcf files gives the SNPs for individuals HG00097 and HG00099, and I'd like to get the sequences for each individual. Additionally, I'd like to skip indels, if it's possible. So for I've checked using vcf-consensus but it's given an error 'Broken VCF header', and i'm not entirely sure if it'll output what I need. Is there a program that can do this?

ADD REPLY

Login before adding your answer.

Traffic: 1955 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6