ISCN annotation for SV/CN VCF files
1
4
Entering edit mode
15 months ago
a.beggs ▴ 60

Hi

I'm trying to annotate the output of a VCF file that contains SV and CNV calls from long read sequencing. The conventional tools work well enough, but I can find anything that will annotate with the ISCN annotation for the whole VCF file... e.g. 46,XX,(del 15)(q11.2,q13) for output of a pipeline I'm writing.

Thanks in advance.

Andrew

SV • 8.4k views
ADD COMMENT
1
Entering edit mode

I'd say this would be a great project. iscn and hgvs seem to exist in a different parallel world to most bioinfo tools :)

ADD REPLY
2
Entering edit mode

Hi @cmdcolin - I half agree with this post. There are several good packages for HGVS-based naming of SNVs and indels according to HGVS nomenclature - many clinical geneticists and molecular pathologists will "check their work" using these tools for clinical purposes - they work well and can take a variety of inputs.

However, I do agree with you when it comes to ISCN. The HGVS software manuscript (linked in answer below) makes reference to this, and it appears it has not been done in the intervening time, unless I missed something. I think if one wanted to build a tool such of this, partnering with this group would be a good move.

Perhaps the increasing accessibility of 3rd gen / long-read technology will bring more attention to this going forward!

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode
ADD REPLY
1
Entering edit mode
15 months ago
LauferVA 4.5k

Hi a.beggs - interesting question. I found a couple relevant directions, though no one, definitive answer per se.

First, probably worth saying that, at least at the time of publication, the (officially endorsed) HGVS software package does not support ISCN nomenclature (discussion, second to last paragraph). Still, worth familiarizing oneself with this.


Second, according to this biostars post, CNV callers for microarrays do this. However, I looked at popular CNV/SV calling packages like CNVkit, but could not find the specific ones referred to. The author of that answer may be knowledgeable on this subject.


Next, I tried googling using queries like "python tool for ISCN nomenclature". These searchers turn up several tools, e.g. this one. Now of course, what we want a repository like this one that takes FASTQ, BAM, or VCF files as input and returns a ISCN karyotype - I could not find functionality. But, I am including CNVkit because it DOES, at least, give you info. about the type of file formats currently used in processing ISCN-formatted data, which might ultimately be part of a solution should you need to write this yourself. Moreover, and possibly more importantly, this manuscript introduces the idea of the Mitelman database, which several of the other repos going forward will also draw on.

Granted that going from karyotype to genomic coordinate is problematic (because even a single cytoband may be many Mb) I was not sure the next tool I identified - CytoConverter - to be helpful, but it is relevant. It does what we want in reverse and works with >R3.5.


Finally, I found a pub or two that seems like it might do what you want, but was created for another reason. For instance, this manuscript discusses "A hybrid approach for automated mutation annotation of the extended human mutation landscape in scientific literature". This manuscript also has a link to a git repo, here.


The last thing I wanted to say is, what would I do if it were me solving this problem?

Overall, it was a bit frustrating because I found both allusions to what you are describing AND very closely related software, but I could not find an exact match for the kind of package I think you want.

Ultimately, thinking about all the issues involved, my conclusion is if it were my I'd just write it myself. First, writing a 4-column file with cytoband, GRCh37, 38, and T2T coords would be trivially easy using UCSC table browser.

From there, it would be a simple matter of linking the SV caller you are already using to that table. There would be some problems and headaches, sure (complex rearrangements & chromothripsis) but honestly if the SV caller is doing that, then the second step of making the ISCN converter do it correctly I don't think would be that bad.

You'd just need a little recursion I think, no?

ADD COMMENT

Login before adding your answer.

Traffic: 2659 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6