I want to produce a list of all the somatic mutations that are known using the 1000 genome project and using ensembl. How do I start this project?
I want to produce a list of all the somatic mutations that are known using the 1000 genome project and using ensembl. How do I start this project?
While you could use BioMart or the API to retrieve these data, you might be better off with VCF file dumps:
ftp://ftp.ensembl.org/pub/release-79/variation/vcf/homo_sapiens/
Homo_sapiens_somatic.vcf.gz
(all somatic mutations from COSMIC and dbSNP) and Homo_sapiens.vcf.gz (all non-somatic variants from various sources, see http://www.ensembl.org/info/genome/variation/sources_documentation.html) from this directory should be a good place to start.
If you specifically only want those from the 1000 genomes project, you could grep for the 1000 genomes evidence flag:
curl ftp://ftp.ensembl.org/pub/release-79/variation/vcf/homo_sapiens/Homo_sapiens.vcf.gz | gzip -dc | grep E_1000G | head
or download the population-specific files from the above directory, or the VCF files from the original project site (ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/).
1000 Genomes only looked at germline variants. There are no somatic mutations in the 1000 Genomes Project.
Hi Emily,
Thank you for your response. I acutally want to find the somatic mutation in Ensembl and compare them to the SNP mutations that appear in the 1000 Genome project.
Can you steer me in the right direction that could help me with this task. I was wondering if Ensembl has a tutorial to find all the somatic mutations for the first chromosome for example.
Thank You,
Matt
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Maybe you mean COSMIC (catalogue of somatic mutations in cancer)?