How To Obtain At Least 30 Realistlic Example Vcf Files?
5
4
Entering edit mode
12.5 years ago
user56 ▴ 300

In our project, we would like to adopt the Variant Call File format (v.4.0 or the newer 4.1). Is there a public source, where we could download a set of 30 or more example VCF files.

I looked at 1000genomes and saw the BAM files, but they are large and I could not find any VCF files in 1000genomes FTP site. Were they created or they are only in the process of doing so?

vcf data • 29k views
ADD COMMENT
1
Entering edit mode

Use the 1000 Genomes data slicer. It has a nice graphical user interface and easy to understand examples that will help you create small VCF files.

ADD REPLY
0
Entering edit mode

is it necessary that all these 30 VCF files are for a single species or could it be from any species ?

ADD REPLY
5
Entering edit mode
12.5 years ago

I believe that the VCF files from 1000 genomes are available here from EBI and here from NCBI. 1000 genomes has a fairly detailed description of the VCF file format.

ADD COMMENT
1
Entering edit mode
12.5 years ago
user56 ▴ 300

From my own investigation, it seems that in this FTP folder there is a larger set. All populations are in one file. (it should be 1000 genomes)

ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20110521/

only chomosome 6 variants have, however, 9 GB.

ADD COMMENT
0
Entering edit mode
ADD REPLY
1
Entering edit mode
12.3 years ago

You can use tabix to extract subsets of the vcf files from the 1000genomes websites. Thanks to the fact that tabix uses a index file, you will be able to download only portions of the files, without having to download everything in local.

Example:

tabix -h  ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20101123/interim_phase1_release/ALL.chr12.phase1.projectConsensus.genotypes.vcf.gz 1:10000,200000

Have a look at this discussion for more help.

ADD COMMENT
0
Entering edit mode

As of today, this cmd-line doesn't return anything. Here is a quick fix (use "12" instead of "1" and "-" instead of ","):

tabix -h ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20101123/interim_phase1_release/ALL.chr12.phase1.projectConsensus.genotypes.vcf.gz 12:10000-200000
ADD REPLY
1
Entering edit mode
12.3 years ago
user56 ▴ 300

I was able to use Complete Genomics public data set of 69 genomoes. Initially they did not have index file for me to use tabix right away, but after a forum post, and few days, they created those so I did not have to download the full 70+GB of data.

ADD COMMENT
0
Entering edit mode
12.5 years ago

You could ask Zev Kronenberg for his 200 Danish exomes in VCF.

ADD COMMENT

Login before adding your answer.

Traffic: 1563 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6