My first published bioinformatics pipeline, that finds and quantifies scars of special splicing events (SL trans-splicing) in RNA-seq reads:
SL-quant: A fast and flexible pipeline to quantify spliced leader trans-splicing events from RNA-seq data
The spliceosomal transfer of a short spliced leader (SL) RNA to an independent pre-mRNA molecule is called SL trans-splicing and is widespread in the nematode C. elegans. While RNA-seq data contain information on such events, properly documented methods to extract them are lacking.
To address this, we developed SL-quant, a fast and flexible pipeline that adapts to paired-end and single-end RNA-seq data and accurately quantifies SL trans-splicing events. It is designed to work downstream of read mapping and uses the reads left unmapped as primary input. Briefly, the SL-sequences are identified with high specificity and are trimmed from the input reads, which are then re-mapped on the reference genome and quantified at the nucleotide position level (SL trans-splice sites) or at the gene level.
SL-quant completes within 10 minutes on a basic desktop computer for typical C.elegans RNA-seq datasets, and can be applied to other species as well. Validating the method, the SL trans-splice sites identified display the expected consensus sequence and the results of the gene-level quantification are predictive of the gene position within operons. We also compared SL-quant to a recently published SL-containing read identification strategy which revealed being more sensitive, but less specific than SL-quant. Both methods are implemented as a bash script available under the MIT licence at https://github.com/cyaguesa/SL-quant. Full instructions for its installation, usage, and adaptation to other organisms are provided.
A) The trans-splicing process. Splice leader RNA precursors (SL RNA) are small nuclear RNAs capped with a trimethyl-guanosine (TMG). The 5’-region of the SL RNA, including the TMG cap, is spliced on the first exon of the pre-mRNAs. SL-quant identifies RNA-seq reads originating from trans-spliced RNAs. B) Consensus sequence at SL trans-splice sites. Sequence logo of the sequence environment surrounding SL1 or SL2 trans-splice sites determined by SL-quant on the SRR1585277 dataset in single-end mode. C) Prediction of genes position in operons. Number of SL1 and SL2 trans-splicing events by genes as calculated by SL-quant. Genes annotated as downstream in the operons are represented as red dots
Carlo Yague-Sanz & Damien Hermand (2018) SL-quant: A fast and flexible pipeline to quantify spliced leader trans-splicing events from RNA-seq data. GigaScience. (link)