How to create dummy VCF with fusion variants
1
0
Entering edit mode
6.7 years ago
Jackie ▴ 70

I am developing a tool which would take a gene list as input, and the output would be a VCF including fusion variants involving the genes of interest.

I am going to use Mitelman database which has a comprehensive curation of fusions from published literatures. However, the breakpoints are in cytoband format, i.e., 19p13, rather than a genomic coordinate, such like chr1 10099767.

My questions are: - Are there any other comprehensive databases that you would recommend which have precise breakpoints information in genomic coordinates format? (I tried COSMIC, TCGA, the fusion files are as not comprehensive, seems many well-known fusions are missing from those). - If Mitelman turns out to be the most comprehensive database, how would you suggest I can find corresponding genomic coordinates for the fusions in mitelman in a easy way?

Thanks!

fusions mitelman fusion databases • 2.8k views
ADD COMMENT
0
Entering edit mode

Mitelman Database of Chromosome Aberrations and Gene Fusions in Cancer https://ulib.iupui.edu/databases/mitelman-database-chromosome-aberrations-and-gene-fusions-cancer

ADD REPLY
0
Entering edit mode

Thank you Pierre for the answer. I have actually already downloaded the raw data from Mitelman website, but the question now is, how to convert the breakpoints in cytoband format into genomics coordinates, as that's what would be needed for generating a VCF. I think I can download some cytoband annotation file for converting between genomic coordinates and cytobands, but I just wonder whether there is an easier way.

Thanks,

ADD REPLY
0
Entering edit mode

It wasn't an answer, just a hyperlink because I had no idea of what is "...using is Mitelman which has ..."

ADD REPLY
0
Entering edit mode

Sorry about the confusion, and thanks for providing the hyperlink. I have updated the post by inserting the hyperlink for Mitelman.

ADD REPLY
1
Entering edit mode
6.7 years ago
d-cameron ★ 2.9k

You need a higher resolution database. If you only have cytoband, that's even lower resolution than gene name. That's not your only problem though. Even if you did have it, identical somatic driver gene fusions are not always at the same genomic position. Eg, a fusion of geneA exon1,2 to geneB exon 4,5,6 would occur for a breakpoint anywhere in gene A intron 2 to anywhere in gene B exon 3. It's even more complicated than that as, for some fusions, functional fusion transcript can generated by possible exon combinations (e.g. at least the first two 2 exons of geneA connected to at least the last 2 exons of gene B).

In summary, there are many possible genomic coordinate pairs that will all result in the same fusion transcript.

ADD COMMENT

Login before adding your answer.

Traffic: 2509 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6