Looking for comprehensive VCF from 1000 Genomes
2
1
Entering edit mode
8.7 years ago
Jessie ▴ 10

Hi,

Is there a comprehensive VCF containing all 3500 samples from the 1000 Genomes project?

The data appears to be split across releases, and I am trying to find all of the 1000 Genomes samples for ethnicities CEU, ASW, and JPT in VCF format. Using release 20130502 I am able to find the majority of the ASW and JPT samples, but not the CEU. I have looked at the other releases and can't seem to find a VCF containing the CEU samples.

Additionally, are there any special considerations for analyzing the different ethnic groups given that the sequencing platforms for the project changed over time and may cause batch effects with ethnic group subsets?

Any help is much appreciated!

genome 1000genomes • 3.2k views
ADD COMMENT
1
Entering edit mode
8.7 years ago

Matters what kind of variants you want? The comprehensive SV VCF is here

There should not be any additional considerations for analyzing different ethnic groups other than the ones you normally would apply in your analysis such as controlling for genomic stratification for association testing. Alignments before hg38 will not take advantage of some of the diversity in structurally complex regions, but that should not affect most studies.

Generally you should stick to analyzing sequence data according to their release ( i.e. don't mix high coverage and low coverage samples unless you normalize accordingly). But you can use the different platforms for validation. For example, phasing haplotypes with low coverage data can be cross validated with long read sequences, which exist for some 1000G samples.

If you're looking for the low coverage BAM files they are in two places

There are fewer high coverage BAM files but they exist on different platforms and scattered throughout the FTP.

1000 Genomes also has an AWS S3 bucket that should be free because it's public. I believe the path is s3://1000genomes/

ADD COMMENT
0
Entering edit mode

Thank you! Do you know of a VCF specifically for SNPs / SNVs?

ADD REPLY
0
Entering edit mode

Sorry I'm a collaborator with only the structural variation group of 1000G

ADD REPLY
0
Entering edit mode
8.7 years ago
Jessie ▴ 10

Thanks very much for your help. The CEU genomes appear to be absent in the comprehensive SV VCF file (there are only 6 out of 43). Is there an explanation for why the comprehensive SV VCF would be lacking large proportions of some ethnic sets?

For the CEU case, it could be due to the earlier release of the CEU compared to ASW/JPT. However, since for this particular case I only need SNPs, I am looking for a multiVCF file with all ethnicities and SNVs. Does this exist somewhere?

ADD COMMENT
2
Entering edit mode

The VCF for SNPs exists somewhere in the 1000 genomes VCF. You can do a recursive ls using either AWS or FTP and grep "*vcf" to find all the VCFs available.

The reason why there are not all the CEU genomes is because there are a lot of individuals that are related. NA12878 has parents and children if I recall correctly. Having too many related individuals is not good for a study on global genomic variation.

ADD REPLY
0
Entering edit mode

Thank you! I chose 1 individual randomly from each family (for a total of 43 unrelated CEU individuals) but only 6 of these seem to exist in the comprehensive VCF. I will try choosing a different subset of unrelated individuals to see whether they may be present in the comprehensive VCF, but why so many samples are missing from the comprehensive VCF (at least 37 CEU) is still a mystery to me. Thanks again!

ADD REPLY
1
Entering edit mode

Hi Jessie,

Since this is a comment to QVINTVS' answer, it should more appropriately be added as a reply (as you did with your earlier reply) instead of an answer to your question. That keeps all related discussion together in one place. New answers should only be used for answers to the original question.

ADD REPLY
0
Entering edit mode

Will do so next time, but I'll leave as is this time so the answers below remain.

ADD REPLY

Login before adding your answer.

Traffic: 2875 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6