Where can I find the reference files for the 1000 Genome project VCF data?
1
0
Entering edit mode
5.9 years ago
caggtaagtat ★ 1.9k

Hello,

I just started working with VCF files and would like to use the data of the 1000 Genome project. I found that the most recent version can be downloaded at ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ . For my functional analysis, I need the position of every exon borders. Therefore I am looking 1) for the FASTA file, which was used as a reference during SNP calling and 2) a corresponding GTF file for annotation.

1) For the FASTA files, it is stated within the VCF files, that it comes from here: ftp://ftp.1000genomes.ebi.ac.uk//vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz

However other post of Biostar propose using the following, which is a little bit larger: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz

2) For the GTF files, I'm not sure if there even is one to download.

VCF 1000 Genome Reference • 3.7k views
ADD COMMENT
0
Entering edit mode

if you are working with grch37 vcf files, the reference would be available at : ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz

ADD REPLY
2
Entering edit mode
5.9 years ago

Hello caggtaagtat ,

I would take the reference sequence statet out the vcf file, than you can be sure to not run in any problems like different naming conventions for the chromosom.

The reference genome is GRCh37 (hg19) so you can take any annotation file for this reference genome, e.g. GENCODE. But before annotating, check how the chromsomes are named. It might be neccessary to rename them.

fin swimmer

ADD COMMENT
0
Entering edit mode

Ok thank you. So I will download the GRCh37 annotation then.

ADD REPLY
0
Entering edit mode

Hi finswimmer, Do you mean that the annotation file for GRCh37 (hg19) can be used in the hs37d5 version of the reference genome?

ADD REPLY

Login before adding your answer.

Traffic: 1921 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6