for a list of Structural Variants (including deletions, duplications, inversions, translocations), either in VCF or BEDPE format, we would like to have the gene annotations, and the lists of the following sets of genes :
-- fusions (if both breakpoints are in exons, introns, utrs)
-- truncations (if only one breakpoint is in exon, intron, utr; and the other breakpoint is in intergenic area)
-- the genes in the areas that are deleted, duplicated, inverted
Although I wrote some scripts in perl based on Annovar , thought that we could get all these annotations with a package that is already available ?
ADD COMMENT
• link
updated 6.5 years ago by
LGMgeo
▴
110
•
written 7.8 years ago by
Bogdan
★
1.4k
0
Entering edit mode
Dear Daniel, these are very good suggestions, thank you ! 'm planning to use StructuralVariantAnnotation and compare the results with those derived from my Perl scripts.
Our work is primarily related to SOMATIC SV (in pediatric cancers), and thought that I can ask you please : any recommendations regarding the SV callers to use ? i've started with DELLY, LUMPY, and MANTA and now I cam comparing the results.
also, 've read your paper and work on GRIDSS, it looks great ;) although it seems that the focus has been more on germline calls ;)
The GRIDSS paper focused on germ-line results, but most of our applications have been in cancer genomics and GRIDSS did manage to win the ICGC-TCGA DREAM Somatic Mutation Calling Challenge (SV sub-challenge #5).
Dear Daniel, thank you for the information on SV calling. Considering your experience with all SV callers, and the nice ROC curves from your publication, may I ask please :
-- about filtering, would you have please strong recommendation about the numerical values for Allele Fraction, number of PAIRED-READS or SPLIT-READS ?
-- probably using 2-3 SV callers may offer less False Negatives than using only 1 SV caller. And if it is so, beside GRIDSS, which other Sv caller(s) would you recommend ?
thanks a lot for sharing your experience with us !
Please do not add answers unless you're answering the top level question. If you're replying to someone, use the Add Comment or Add Reply options. I'm moving your "answer"s to comments now.
Sure, but multiple answers that do not answer the question confuse people. Please check out https://www.biostars.org/t/how-to/ for more information. Thank you!
SVs are problematic for many pipelines/software as, unlike SNVs and small indels, each event involves at least two genomic loci.
Be aware that not all callers correctly classify events. Many callers will classify events purely on their break-end position and orientation. This results in deletion calls even when there is no copy number change to support the event (most callers), or an inversion calls even when only one of the two inversion breakpoints actually exist (e.g. DELLY). For simple germline analysis this is probably ok, and you can just ignore all large or inter-chromosomal events but for highly rearranged genomes (eg cancer), things are much more complicated.
thought that we could get all these annotations with a package that is already available
What you're asking is really two separate processes: one for looking at the intervening sequence of simple events, and another for break-end overlap for fusions/interchromosomal/complex events.
If you're familiar with BioConductor then you can do the first part relatively easily for a BEDPE: just convert to GRanges intervals and calculate overlaps against the BioConductor annotation package for your organism.
For the second part you might be interested in my StructuralVariantAnnotation package. It's key feature is conversion of VCFs generated by a number of popular SV callers into a GRanges object containing break-end coordinates. Once in GRanges format, you can again use the BioConductor annotation packages to calculate feature overlap.
I suggest using AnnotSV for SV annotation (annotation with gene names and locations, OMIM, DGV, 1000g, haploinsufficiency, TAD, ... and also with your own in-house information).
AnnotSV constructs an annotation based on the full-length SV but also an annotation for each gene within the SV. You will so have access to :
all the overlapped genes information (ID, OMIM...)
the SV location within each overlapped gene (e.g. "exon3-intron11", "txStart-intron19", ...). You could so determine fusion or truncation events.
Dear Daniel, these are very good suggestions, thank you ! 'm planning to use StructuralVariantAnnotation and compare the results with those derived from my Perl scripts.
Our work is primarily related to SOMATIC SV (in pediatric cancers), and thought that I can ask you please : any recommendations regarding the SV callers to use ? i've started with DELLY, LUMPY, and MANTA and now I cam comparing the results.
also, 've read your paper and work on GRIDSS, it looks great ;) although it seems that the focus has been more on germline calls ;)
The GRIDSS paper focused on germ-line results, but most of our applications have been in cancer genomics and GRIDSS did manage to win the ICGC-TCGA DREAM Somatic Mutation Calling Challenge (SV sub-challenge #5).
See https://github.com/PapenfussLab/gridss/blob/master/example/somatic.sh for very basic tumour/normal somatic variant calling using GRIDSS.
thanks, Daniel, i could run GRIDSS as soon as our new PBS cluster is completely configured.
also please may I ask, what filtering criteria would you recommend for SV ? particularly AF, or number of SR and PR.
and, if you do not mind me asking, after Somatic Mutation Challenge, beside DELLY, MANTA and GRIDSS, which other algorithms did reasonably well ?
Somatic calling Leaderboard results are publicly available at https://www.synapse.org/#!Synapse:syn312572/wiki/61509
Dear Daniel, thank you for the information on SV calling. Considering your experience with all SV callers, and the nice ROC curves from your publication, may I ask please :
-- about filtering, would you have please strong recommendation about the numerical values for Allele Fraction, number of PAIRED-READS or SPLIT-READS ?
-- probably using 2-3 SV callers may offer less False Negatives than using only 1 SV caller. And if it is so, beside GRIDSS, which other Sv caller(s) would you recommend ?
thanks a lot for sharing your experience with us !
Please do not add answers unless you're answering the top level question. If you're replying to someone, use the
Add Comment
orAdd Reply
options. I'm moving your "answer"s to comments now.thanks, Ram ;) a pretty exciting conversation, I shall say ;)
Sure, but multiple answers that do not answer the question confuse people. Please check out https://www.biostars.org/t/how-to/ for more information. Thank you!
ok ;) thank you, Ram ;)
How did you do with this? I'm developing a pipeline that works well for me if you're still looking for help