I have the 1000 genomes VCF, but I am wondering if there are VCF files avail for other genomes like
- Korean genomes
- African genomes
- Venter
- Watson
Cheers
I have the 1000 genomes VCF, but I am wondering if there are VCF files avail for other genomes like
Cheers
Complete genomics has some publicly available datasets. I am sure there is a converter to VCF. If you have an FTP server, webspace or somehow to share data, I would be happy to send you the 200 Danish exomes in VCF.
The SNPs for those genomes are available for download at the UCSC under the name "pg*" you could generate those VCF files using awk. Something like:
$ curl -s "http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/pgVenter.txt.gz" |\
gunzip -c |\
awk 'BEGIN { printf("#CHROM\tPOS\tID\tREF\tALT\n");} { printf("%s\t%d\t.\t.\t%s\n",$2,1+int($3),$5);}'
#CHROM POS ID REF ALT
chr1 65745 . . G
chr1 65797 . . C
chr1 65872 . . G
chr1 66008 . . G
chr1 66162 . . T
chr1 66258 . . G
chr1 66275 . . T
chr1 66294 . . TA/AT
chr1 66312 . . T
chr1 566139 . . A/C
(...)
VCF is a very flexible format & I would be careful converting Complete Genomics directly into VCF on your own -- for example Complete handles complex variants very differently compared to how 1000G handled them in the Pilot phase. Digging into the supplemental information on the Korean genome publication etc. can help fill some of those extra fields.
Also, the genomes you've mentioned contain Structural Variation data of various degrees of completeness -- and VCF files do exist for these kinds of variants as well.
By VCF file, do you mean you're interested in the format itself, or a particular kind of variant?
Depending on what you're trying to do, you might find Kaviar useful: --> http://db.systemsbiology.net/kaviar/
Greetings again. I have loaded the Danish Exomes to a Dropbox. Shoot me an email to get the goods :-).
Dear Zev, I'm also really interested on the 200 Danish exomes in VCF. Is still possible share it?
My email is: elisa.cirillo@maastrichtuniversity.nl
Thank you very much!
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Hi Zev: I am interested in the dataset too. If you have permissions (including IRB and institutional approval), it will be nice if you can upload the data to a public data repositories like European Nucleotide Archive (http://www.ebi.ac.uk/ena/data/search/) or similar resources and share the URL here.
Hi Zev -- I'd love to have access to the 200 Danish exomes as well, would be glad to provide more details.
I believe I don't need IRB since I generated the VCF calls from raw reads that were publicly available?
Awesome, if ENA is not appropriate for VCF submissions, you may also try Dryad data repositiory http://datadryad.org/.
Hi Zev, Kevin... what about publishing the Danes VCF's it into http://gigadb.org/ ? I've been told that the Danes VCF's were processed in collaboration with BGI, maybe it makes sense to have it there then.
Greetings,
I don't want to upload these data to a site where I might be confused as being involved in the project. I have the VCFs all ready to go... any other suggestions?
our pipeline:
BWA SAMTOOLS PICARD - dedup GATK - INDEL realign SAMTOOLS - call variants
Hi Zev, are the VCFs still available? I am interested in obtaining a copy. my email is ashkot[at]hotmail.com
Hi Zev -- I'd love to have access to the 200 Danish exomes as well, could you send me? eyupsvs@hotmail.com
can you share the file via OneDrive