Correct Reference Genome For Cosmic
1
0
Entering edit mode
11.3 years ago
sfcarroll ▴ 80

I am working with the COSMIC data extracts, specifically the UCSC Mutation Export, available at: ftp://ftp.sanger.ac.uk/pub/CGP/cosmic/data_export/Request_based_exports/UCSCMutExp_v66_250713.csv.gz

I am using the mutation locations in the file and then looking them up in the UCSC database. I noticed there is a FASTA archive hosted in the COSMIC ftp site. Should I be using this instead of the hg19.2bit file available from UCSC, or is the data the same? ftp://ftp.sanger.ac.uk/pub/CGP/cosmic/fasta.tgz

edit Related question: Applying SNP masking in primer creation

• 2.7k views
ADD COMMENT
0
Entering edit mode
md5sum file1.fa file2.fa

If they contain the same info, the hashes will be identical.

ADD REPLY
0
Entering edit mode

You might want to ask the COSMIC folks what they recommend, see http://cancer.sanger.ac.uk/cancergenome/projects/cosmic/contact

ADD REPLY
0
Entering edit mode
11.2 years ago

You could use the raw COSMIC tab-delimited data instead of the request-based exports. It contains a mix of Build36 and Build37 loci. To parse through it, standardize everything to Build37, and fix several other problems, here is a script I recently posted to Github: https://github.com/ckandoth/parse-cosmic (Now deprecated)

And here is a list of standardized and annotated variants from Cosmic v64, with details on the caveats to parsing Cosmic data: https://www.synapse.org/#!Synapse:syn1855816

ADD COMMENT

Login before adding your answer.

Traffic: 2512 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6