Hello all,
I was just wondering what a reasonable percentage of 'novel' splice junctions/splice events is for human RNAseq data using the program junction_annotation.py. I am new to RNAseq and just running some published human RNAseq data through my pipeline in order to familiarize myself with the programs and protocols. When I performed this splice junction analysis I got what was to me an eyebrow raising estimate of novel splice junctions/events:
Splicing junctions: - Complete Novel = 62% - Partial novel =5% - Annotated 34%
Splicing events - Complete Novel =17% - partial novel=1% - known =81%
Should I be worried about that 62% complete novel splice junction estimate?
If you are interested, here is what I've done:
I am using 104 bp paired end reads off of avg. 250bp fragments (distribution of inner distances has stdev of 50).
From a GTF file Homo_sapiens.GRCh38.95.gtf.gz I created a bed file using the following command line:
$ awk '{if($3 != "gene") print $0}' homo_sapiens_grch38.95_chameleon_cleaned.gtf | grep -v "^#" | gtfToGenePred /dev/stdin /dev/stdout | genePredToBed stdin Homo_sapiens.GRCh38.95.bed
While my bam file was generated from a HISAT2 .sam output using the command line:
samtools view -bS testoutput3.sam | samtools sort -o testoutput3.bam
enter code here
Using the program junction_annotation.py with the following command line:
$ junction_annotation.py -r Homo_sapiens.GRCh38.95.bed -i testoutput3.bam -o out
I got the following output:
Reading reference bed file: Homo_sapiens.GRCh38.95.bed ... Done
Load BAM file ... Done
total = 14081359
===================================================================
Total splicing Events: 14081359
Known Splicing Events: 11341230
Partial Novel Splicing Events: 99514
Novel Splicing Events: 2348855
Total splicing Junctions: 441831
Known Splicing Junctions: 148196
Partial Novel Splicing Junctions: 21482
Novel Splicing Junctions: 272153
===================================================================
null device
1
null device
1
Many thanks for any advice/input/help you can give!
Double-check the version that you are using. Note the release notes:
[source: http://rseqc.sourceforge.net/]