Hi everyone,
I want to analysis the mutation genes in Pan-TCGA, provided in Mutational landscape and significance across 12 major cancer types. For a better understand the analysis the mutation genes in the certain cancer type, so I want to know where could I get the control set data compared with this cancer samples.
Which means, I organize the data from the upper link into this format,
Cancer Sample Gene_1 Gene_2 Gene... Gene_20000
TCGA-02-001... NA Mis-sense … Nonsense
TCGA-02-002... Indel NA … Mis-sense
TCGA-02-003... NA Mis-sense NA
… … … … …
TCGA-02-584... Nonsense NA … NA
As a comparison, I need the data for normal sample (people without any cancer) also with some mutation genes for the whole genome, which can be organized in the same format as upper. Does anyone know where could I get this kind of data for analysis?
Thanks!
PS:
For making sure my idea is clear to everyone, I will illustrate it in more detail.
The source data from paper is listed in this format(Part of it),
Tumor_Sample Gene Start_Position Variant_Class Ref_Allele Var_Allele amino_change.
TCGA-02-0003-01A... HRH2 175110351 Missense G A p.V39I
TCGA-02-0003-01A... NR1I3 161206281 Silent C T p.A25
TCGA-02-0003-01A... ALMS1 73680365 Silent A G p.E2236
… … … … … … …
TCGA-02-0047-01A... GPR132 105518226 Missense C T p.C74Y
TCGA-02-0047-01A... BUB1B 40512942 Silent G A p.G1059
TCGA-02-0047-01A... PLEKHG1 151152163 Missense G A p.G639E
TCGA-02-0047-01A... SPACA3 31322643 Frame_Shift_Del C 0 p.S16fs
… … … … … … …
As we can see from the table above, for sample 'TCGA-02-0003-01A-01D-1490-08' and 'TCGA-02-0047-01A-01D-1490-08' have no more than one variant genes in whole genomes. And those two samples are all belong to Glioblastoma multiforme(One type of cancer).
So the data I'm looking for is similar to the data above, and the only difference is the data I'm looking for is belong to Normal people (not belong to any cancer types) or Normal samples from cancer patients (with the TCGA barcode is like to 'TCGA-02-0047-11A-01D-1490-08', which is belong to normal tissues of the patients).
I think TCGA group probably have the data I want in both, normal people or normal samples, at least I think the normal samples from patients is public accessible. But I don't know where to find it :( ...
Wish I have describe my question clearly.
It is very nice for you to answer my question so specific.
I think my work should be related with the first one. After reading your answer, I still have two questions about it.