Hello,
I'm trying to understand how structural variants are reported from tools, and then subsequently how they're annotated in MAF format (if at all). Can someone point me to a papers or tutorials?
Hello,
I'm trying to understand how structural variants are reported from tools, and then subsequently how they're annotated in MAF format (if at all). Can someone point me to a papers or tutorials?
Here are a couple of good reviews of SV detection methods, if you haven't already seen them:
For understanding information about reads, the documentation for IGV might be a good place to start. Try loading an example tumor BAM file with a known fusion breakpoint, and see if you can sort out all of the information IGV is displaying in the reads around that site.
The 'standard' format for reporting SVs is the VCF file format using the SOMATIC
flag (note that not all callers will actually write that flag). Converting VCF SVs into MAF is problematic since the MAF format appears to only support single-loci indel SV events. I recommending not converting to MAF as it is likely you will lose important SV events (such as driver gene fusion events), in the conversion to MAF.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Did you try google or google scholar?
edit: added How To Ask Good Questions On Technical And Scientific Forums link.
While google does wonders in most instances it is not able to always always surface the best resource (for one it can't read your mind, though it seems to be getting close with each passing day). For a newbie trying to weed out usable resources from a search can be a daunting task.
I vote for you : best answer 2017 :)
Thank you for the comments.
Papers are papers. They're geared towards audiences that already know the ins and outs of structural variants. I understand what a VCF/BCF format is, but my annotation has columns named split read support and tumor split read support and tumor variant allele count, and perhaps I'm not understanding these definitions properly to understand the real difference between total split read support and tumor split read support. I thought only the tumors would have viable split reads anyways, and so wouldn't all the split reads that surface result from tumor only? I was hoping to find documentation on such annotation to maybe clarify these definitions. The annotation could have been done in house in my lab (and I'm still trying to contact/find this person). I haven't been able to find any such documentation, even in the DELLY paper it's a bit vague.
Note that i wouldn't be posting here if I hadn't tried for hours to search online and found nothing. If you searched and found something, just let me know. If you don't have any input, I would rather hear nothing. I post here because it's supposed to be a collaborative community and it has helped me in the past.
These are not standard VCF SV fields and are specific to DELLY. The standard VCF fields can be found in the VCF file format specifications.
Somatic SV are those that appear in the tumour but not the normal. It appears that DELLY is reporting to overall total (both normal & tumour) split read counts and the tumour-only split read counts. Consult the DELLY documentation/paper/source code for exact details on how DELLY calculcates these fields.
This detail is what should have been in the original post. With this hopefully you will get an answer soon. You should add this info in the original post.
I appreciate that it should have been detailed, but I was wondering if there was something for structural variants in general outside of DELLY.
The only 'standard' we have is the VCF file format specifications. Unfortunately, the standardized fields do not include the fields required for a breakdown of the SV support by type (split read, discordant read pair, one-end anchored read, assembled breakend contig, assembled breakpoint contig, and so on) so each caller has to define their own fields (e.g. my caller GRIDSS, reports split read counts using a "SR" field). Note that caller counts can differ due to differences in the algorithm and filtering steps applied (e.g. some callers do not include split reads that have a MAPQ of 0 in their counts).