I am interested in analyzing TCGA data and have been approved through dbGaP for the data access.
Question is: I want to use TCGA germline genotype data (In many published papers, they used Affymetrix 6 SNP data) but it seems that they do not provide array-based genotype data.
Where can I find the genotype data of TCGA?
I have seen several similar questions, but couldn't find the right answer for me.
If you select data and then select the "Files" tab you can select "Genotyping Array" as the data type. However, it appears in the main data portal only copy number variation is available as having been called from the arrays.
If instead you select "legacy archive" from the portal front page, you can select "Files">"Genotyping Array">"Simple Nucleotide Variation".
Is there a reason why you want to use the array data? All the samples have been subjected to whole exome sequencing which should be better quality data.
Thanks, so they were moved to archive.
Actually, it will be better to use sequencing data of blood samples in vcf fomat.
But when I chose "SNV" fot Data Category, it returned only "somatic mutations" files. So I thought I should use array-based genotype files!
But when I downloaded these "Annotated Somatic Mutation" files in vcf, they contain genotype column for both normal and tumor. So I guess I could use these files to get germline genotypes.
I do appreciate your answer again. It really helped!
Dear kelly.wang135,
I am sorry for the late request for clarification - you did indeed find germline mutations in the vcf file from the harmonzied portal? As far as I understood, all germline mutations are already filtered out in the vcf files, no?
In case I have overseen something, please let me know.
Thanks, so they were moved to archive. Actually, it will be better to use sequencing data of blood samples in vcf fomat. But when I chose "SNV" fot Data Category, it returned only "somatic mutations" files. So I thought I should use array-based genotype files!
But when I downloaded these "Annotated Somatic Mutation" files in vcf, they contain genotype column for both normal and tumor. So I guess I could use these files to get germline genotypes.
I do appreciate your answer again. It really helped!
Dear kelly.wang135, I am sorry for the late request for clarification - you did indeed find germline mutations in the vcf file from the harmonzied portal? As far as I understood, all germline mutations are already filtered out in the vcf files, no? In case I have overseen something, please let me know.