GRIDSS is typically used for detecting structural variation breakpoints from short read sequencing data but is a modular software suite containing a number of tools useful for the detection of genomic rearrangements including:
- A structural variant caller. The GRIDSS caller uses break-end assembly, split read, and read pair evidence to call variants.
- A genome-wide break-end assembler. Assembles all break-end contigs without resorted to targeted or windowed assembly.
- A read extractor. Supports extraction of any subset of indel-containing reads, soft/(hard)-clipped reads, split reads, discordant read pairs, read pairs with only one read mapped and unmapped reads in a single step.
- A split read identifier. This tools converts soft-clipped reads to split reads using the standard SA SAM tag. As it output split read alignments using the standard SA tag, it can also be used to remove the dependency on bwa alignment for tools such as LUMPY and Wham.
GRIDSS has been extensively tested across a wide range read depths, read lengths and library fragment sizes. Except for reads shorter than 50bp and coverage under ~10x, GRIDSS outperforms existing SV callers. On 50x 2x100bp human cell line WGS data, GRIDSS achieve false discovery rate half that of BreakDancer, CREST, DELLY, HYDRA, LUMPY, Manta, Pindel, and Socrates with no loss of sensitivity. Benchmarking results are available at http://shiny.wehi.edu.au/cameron.d/sv_benchmark/.
The GRIDSS preprint is available at http://biorxiv.org/content/early/2017/02/21/110387.
GRIDSS is free and open source software and is available at https://github.com/PapenfussLab/gridss/. Java 1.8 and an aligner (bwa by default) are required.
Has it been tested on long reads from PacBio or Oxford Nanopore?
It hasn't. The current implementation assumes an Illumina-style sequencing error model (ie error are substitutions, not indels) thus is not suitable for uncorrected long reads.
Can I use this tool for whole exome sequencing data or clinic exome panel data?
Yes, but like all breakpoint detection tools, you're not going to find much because most SVs occur in inter-genic or intronic sequences. These are not targeted by WES so read counts (hence signal strength) in these off-target regions is very low.
It seems that Alexandrov's structural variant and copy number signatures estimated by the SigProfiler software ecosystem aren't compatible with GRIDSS output, at least not in a straightforward way. Also, the treatment of structural variants is very basic and I think they didn't have anyone with much expertise on their team. I am not a structural variant expert, either. So, how should I try convincing a biologist collaborator that the signatures aren't particularly meaningful? Also, the complex categories inferred by LINX don't have an easy matching to Alexandrov's variant categories, either.