We are hoping to use 1000 Genomes samples as a population control for our study. The 1000 Genomes Project provides fastq, BAM and VCF files on their ftp site. We do not want to use VCF files as they have been filtered and might not contain variants occurring in our samples (especially false-positive variants in our samples). Using dbSNP is problematic for the same reason.
So it seems like a good alternative is to use 1000 Genomes BAM files. However, it would save us compute time if we could use gVCF files. Does anyone know if gVCF files from 1000 Genomes Project samples are publicly available?
Thanks for the info. Our study is focused on rare variation so in our case sample breadth (more samples) is more important than having the properties of variants in our samples exactly match our control population, which is why we are performing analysis of the low-coverage samples. We do not have to analyze all of the low-coverage samples, but more is better.
Right on. For me it was sensitivity and not breadth. I've worked with low-coverage samples too. If you're familiar with AWS, 1000Genomes has a S3 public bucket so transfers should be free (I think) on AWS. I wrote a grant proposal, which was a single paragraph, for AWS and they gifted me a lot of credits. I pulled out features from 2,504 low coverage BAM files in about a week or two on AWS. Good luck!