Hi! Guys, I want to do some association researches between SNP genotypes and phenotypes in cancer patients, just like this article for investigating association between SNPs in UGT2B and breast cancer.
I know SNP information is located in somatic mutation, while I have downloaded somatic mutation MAF files, there are 5 subset files. I leave them as here.
BCGSC__IlluminaHiSeq_DNASeq_automated
BCM__IlluminaGA_DNASeq_automated
BCM__Mixed_DNASeq_curated
BI__IlluminaGA_DNASeq_automated
UCSC__IlluminaGA_DNASeq_automated
Actually these files are all somatic mutation files, but sequenced by different institutions and platforms. There are also some differences between these files, for example, here is a part of BCGSC__IlluminaHiSeq_DNASeq_automated
file (11,12 13column), the genotypes of SNPs in Tumor_Seq_Allele are all zygosity.
Reference_Allele Tumor_Seq_Allele1 Tumor_Seq_Allele2
G A A
G A A
A G G
A T T
C A A
C T T
A G G
..
But in BI__IlluminaGA_DNASeq_automated
file, the mutations are all heterozygous.
Reference_Allele Tumor_Seq_Allele1 Tumor_Seq_Allele2
G G A
G G A
A A T
C C T
A A G
A A G
G G A
..
Although having read the Tutorial Working with MAF files from the TCGA, I still have no idea which file to choose.
To conclude, I have two major problems troubled me.
Firstly, can Tumor_Seq_Allele 1 and Tumor_Seq_Allele 2 really represent the SNP's genotype? Because in the next step , I will select TagSNPs based on MAF(Minor Allele Frequency, filter criterion >0.05), but if I acquire genotypes like that, all the SNPs' frequency are below 0.05, which means no TagSNPs! I'm not sure whether it is right, this question also mentioned the Tumor_Seq_Allele, but I don't understand REF/ALT allele and how to get a more reliable SNP genotype. Secondly, the great disparities between BCGSC and BI files make me confused which file to choose for my next step.
I hope you guys can give me some suggestions. Many thanks!
Please see this comment about zygosity and the answer to it - Working with MAF files (Mutation Annotation Format) from the TCGA (The Cancer Genome Atlas)
I have made supplement about my questions, thank you!
Hello, I am resently confused with the actual meaning of Tumor-Seq-Allele1 and Allele2 nowdays, Did you solve it?