COSMIC to Ensembl mapping
1
0
Entering edit mode
6.5 years ago
Gene_MMP8 ▴ 240

I have downloaded the COSMIC mutation file based on GRCH38. I have the cosmic mutation ids for each mutation (eg, COSM521,COSM520 etc). If I copy these ids and check in the search box of the website I get all the related information such as its emsembl contig etc. Using these ENSEMBL contigs, I visit the ENSEMBL database and extract the sequence associated with this variant. Is there any way to extract all the cosmic variant sequences from the ENSEMBL database without doing this individually for all? In other words, how to map the COSMIC variants with that of the ENSEMBL ones?

Assembly next-gen cosmic ensembl • 2.3k views
ADD COMMENT
0
Entering edit mode

Hello,

it is unclear to me what you mean by

Using these ENSEMBL contigs, I visit the ENSEMBL database and extract the sequence associated with this variant.

Do you mean how the sequence change due to the variant e.g for COSM521 A>G? Isn't this information in the file you've downloaded?

fin swimmer

ADD REPLY
0
Entering edit mode

Sorry for the confusion. I meant flanking sequence containing the variant position

ADD REPLY
0
Entering edit mode

What do you mean by "sequence"? The flanking region? Just the base-pair change?

ADD REPLY
0
Entering edit mode

The flanking region containing the variant

ADD REPLY
0
Entering edit mode

if you have reference sequence (in this case GRCh38), get flank in bedtools (https://bedtools.readthedocs.io/en/latest/content/tools/flank.html) will give flank ranges and using getfasta, get the flanking sequences from above created ranges (https://bedtools.readthedocs.io/en/latest/content/tools/getfasta.html)

ADD REPLY
0
Entering edit mode

Thanks for your reply. So my question is do I need to incorporate variant information in the reference sequence? What if the flanking region for a SNP is part of a INDEL. If I don't incorporate the INDEL variant into the refseq, wouldn't I lose information? Or should I just use the refseq as it is?

ADD REPLY
3
Entering edit mode
6.5 years ago
Emily 24k

You can get the flanking regions for lists of variants in Ensembl using either BioMart or the Perl API.

BioMart is more suited to short lists of variants. There's a help video on using BioMart here. Use the somatic short variation database then filter by your list of IDs, get flanking sequence as attributes – you can specify how large a flank you need.

Alternatively, you can use the Perl API, which has methods in the Variation module to get the 5' and 3' flanking sequence for a variant.

Let me know if you need any help using either of these.

ADD COMMENT
0
Entering edit mode

Thanks for the suggestion. Will surely check it out

ADD REPLY

Login before adding your answer.

Traffic: 1787 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6