What does TCGA uses a a reference for making snp annotation?
1
0
Entering edit mode
5.7 years ago

Hi, as it may seem for my question I'm a newbie at dealing with all the too many databases for sequences. The situation: I have some .maf with mutations of a given cancer, let's say breast cancer, and a given gene, TP53. The .maf clearly says where the mutations start and end. (I'm just interested in point mutations)


The point is that I want to construct a mutated sequence from this mutation data, using the reference sequence a a template, but there are so many different transcripts, so basically I just want to know which one does TCGA uses for references. Is it the whole gen? Or just the exons?

Thanks in advance

TCGA Refseq SNP genome • 1.5k views
ADD COMMENT
0
Entering edit mode

If you want to see the mutation effect of protein, you have to choose exon regions (transcript) from direct splicing or transcripts derived from alternative splicing, try to see if there are mutations. In case branch points, intron exon donor acceptor sites also crucial, since the mutations in these region could affect the splicing. Read this: A: How to analysis mutations effects bioinformatically? and this A: Allele frequency visualization

ADD REPLY
0
Entering edit mode

This does not answer the original question, pltbiotech_tkarthi

ADD REPLY
1
Entering edit mode
5.7 years ago

The TCGA have their own, very slightly custom genome reference, which is basically consists of hg38 analysis set plus some decoy and virus sequences.

You can read about it and download it here: GDC Reference Files

ADD COMMENT
0
Entering edit mode

Just to add, this file is the reference sequence for the whole genome, which , I believe is what TCGA uses. Warning: This is a big file.

ADD REPLY

Login before adding your answer.

Traffic: 2692 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6