Entering edit mode
6.1 years ago
Botond Sipos
★
1.7k
Pinfish is a collection of tools helping to make sense of long transcriptomics data (long cDNA reads, direct RNA reads). The toolchain is composed of the following tools:
spliced_bam2gff
- a tool for converting sorted BAM files containing spliced alignments (generated by minimap2 or GMAP) into GFF2 format. Each read will be represented as a distinct transcript. This tool comes handy when visualizing spliced reads at particular loci and to provide input to the rest of the toolchain.cluster_gff
- this tool takes a sorted GFF2 file as input and clusters together reads having similar exon/intron structure and creates a rough consensus of the clusters by taking the median of exon boundaries from all transcripts in the cluster.polish_clusters
- this tool takes the cluster definitions generated bycluster_gff
and for each cluster creates an error corrected read by mapping all reads on the read with the median length (usingminimap2
) and polishing it usingracon
. The polished reads can be mapped to the genome usingminimap2
orGMAP
.collapse_partials
- this tool takes GFFs generated by eithercluster_gff
orpolish_clusters
and filters out transcripts which are likely to be based on RNA degradation products from the 5' end. The tool clusters the input transcripts into "loci" by the 3' ends and discards transcripts which have a compatible transcripts in the loci with more exons.
The pinfish tools can be run via a Snakemake pipeline which handles the alignment tasks using minimap2
.For more information see the GitHub page of the pipeline.