Question

COSMIC to Ensembl mapping

0

Entering edit mode

6.9 years ago

Gene_MMP8 ▴ 240

I have downloaded the COSMIC mutation file based on GRCH38. I have the cosmic mutation ids for each mutation (eg, COSM521,COSM520 etc). If I copy these ids and check in the search box of the website I get all the related information such as its emsembl contig etc. Using these ENSEMBL contigs, I visit the ENSEMBL database and extract the sequence associated with this variant. Is there any way to extract all the cosmic variant sequences from the ENSEMBL database without doing this individually for all? In other words, how to map the COSMIC variants with that of the ENSEMBL ones?

Assembly next-gen cosmic ensembl • 2.5k views

ADD COMMENT • link updated 6.9 years ago by Emily 24k • written 6.9 years ago by Gene_MMP8 ▴ 240

0

Entering edit mode

Hello,

it is unclear to me what you mean by

Using these ENSEMBL contigs, I visit the ENSEMBL database and extract the sequence associated with this variant.

Do you mean how the sequence change due to the variant e.g for COSM521 A>G? Isn't this information in the file you've downloaded?

fin swimmer

ADD REPLY • link 6.9 years ago by finswimmer 16k

0

Entering edit mode

Sorry for the confusion. I meant flanking sequence containing the variant position

ADD REPLY • link 6.9 years ago by Gene_MMP8 ▴ 240

0

Entering edit mode

What do you mean by "sequence"? The flanking region? Just the base-pair change?

ADD REPLY • link 6.9 years ago by Emily 24k

0

Entering edit mode

The flanking region containing the variant

ADD REPLY • link 6.9 years ago by Gene_MMP8 ▴ 240

0

Entering edit mode

if you have reference sequence (in this case GRCh38), get flank in bedtools (https://bedtools.readthedocs.io/en/latest/content/tools/flank.html) will give flank ranges and using getfasta, get the flanking sequences from above created ranges (https://bedtools.readthedocs.io/en/latest/content/tools/getfasta.html)

ADD REPLY • link 6.9 years ago by cpad0112 21k

0

Entering edit mode

Thanks for your reply. So my question is do I need to incorporate variant information in the reference sequence? What if the flanking region for a SNP is part of a INDEL. If I don't incorporate the INDEL variant into the refseq, wouldn't I lose information? Or should I just use the refseq as it is?

ADD REPLY • link 6.9 years ago by Gene_MMP8 ▴ 240

score 3 · Accepted Answer · 2018-06-01

3

Entering edit mode

6.9 years ago

Emily 24k

You can get the flanking regions for lists of variants in Ensembl using either BioMart or the Perl API.

BioMart is more suited to short lists of variants. There's a help video on using BioMart here. Use the somatic short variation database then filter by your list of IDs, get flanking sequence as attributes – you can specify how large a flank you need.

Alternatively, you can use the Perl API, which has methods in the Variation module to get the 5' and 3' flanking sequence for a variant.

Let me know if you need any help using either of these.

ADD COMMENT • link 6.9 years ago by Emily 24k

0

Entering edit mode

Thanks for the suggestion. Will surely check it out

ADD REPLY • link 6.9 years ago by Gene_MMP8 ▴ 240