Entering edit mode
7.1 years ago
Phoe
▴
20
There are several databases for Nucleic acid sequence, Protein sequence... as Scientific Data providing below. https://www.nature.com/sdata/policies/repositories
Just wondering, will there be a "Variant Calling Format File" Repository? Or where can I get the VCF file, especially for tumor information?
There are plenty, e.g. dbSNP. Provide some details on what you need and want to do with it so that people can help you.
Thanks! I want to get the vcf file in order to mimic the mutation site, making simulated DNA sequences. So, all I want are several vcf files that provide an actual information of diseases(tumor). For example, I just saw that TCGA might provide the file I want; however, it has been under control, which I have no authority to get them.
(1) Will there be the other Database that I don't need the permission to get the VCF file? Or basically, it contains personal information, so that it is necessary under control?
(2) If I'm interested in a certain disease, such as Melanoma. Will there be a vcf file that collects all the possible mutation site for melanoma patient?
Hi Phoe,
Yes, VCF data is 'controlled' at the TCGA, meaning that you have to request access. The publicly available data is in MAF format.
It's not common for studies to make VCFs available - it is more common that they will make the FASTQ or BAM filess available, from which you can re-produce VCFs. You can search for these for melanoma here: https://www.ncbi.nlm.nih.gov/sra/?term=melanoma+dna-seq
As a general rule of thumb, it is good to get data in its most 'raw' form possible so that you have greater flexibility in how to process it.