junction_annotation.py: How many 'novel' splice junctions/splice events are resonably expected from human RNA,
1
1
Entering edit mode
5.9 years ago
RNAseqer ▴ 280

Hello all,

I was just wondering what a reasonable percentage of 'novel' splice junctions/splice events is for human RNAseq data using the program junction_annotation.py. I am new to RNAseq and just running some published human RNAseq data through my pipeline in order to familiarize myself with the programs and protocols. When I performed this splice junction analysis I got what was to me an eyebrow raising estimate of novel splice junctions/events:

Splicing junctions: - Complete Novel = 62% - Partial novel =5% - Annotated 34%

Splicing events - Complete Novel =17% - partial novel=1% - known =81%

Should I be worried about that 62% complete novel splice junction estimate?

If you are interested, here is what I've done:

I am using 104 bp paired end reads off of avg. 250bp fragments (distribution of inner distances has stdev of 50).

From a GTF file Homo_sapiens.GRCh38.95.gtf.gz I created a bed file using the following command line:

$ awk '{if($3 != "gene") print $0}' homo_sapiens_grch38.95_chameleon_cleaned.gtf | grep -v "^#" | gtfToGenePred /dev/stdin /dev/stdout | genePredToBed stdin Homo_sapiens.GRCh38.95.bed

While my bam file was generated from a HISAT2 .sam output using the command line:

samtools view -bS testoutput3.sam | samtools sort -o testoutput3.bam
enter code here

Using the program junction_annotation.py with the following command line:

$ junction_annotation.py -r Homo_sapiens.GRCh38.95.bed -i testoutput3.bam -o out

I got the following output:

    Reading reference bed file:  Homo_sapiens.GRCh38.95.bed  ...  Done
Load BAM file ...  Done
total = 14081359

===================================================================
Total splicing  Events: 14081359
Known Splicing Events:  11341230
Partial Novel Splicing Events:  99514
Novel Splicing Events:  2348855

Total splicing  Junctions:  441831
Known Splicing Junctions:   148196
Partial Novel Splicing Junctions:   21482
Novel Splicing Junctions:   272153

===================================================================
null device 
          1 
null device 
          1

Many thanks for any advice/input/help you can give!

RNA-Seq junction_annotation.py human • 2.1k views
ADD COMMENT
0
Entering edit mode

Double-check the version that you are using. Note the release notes:

RSeQC v2.6.1
Fix bug in “junction_annotation.py” in that it would report some “novel splice junctions” that don’t exist in the BAM files. This happened when reads were clipped and spliced mapped simultaneously.

[source: http://rseqc.sourceforge.net/]

ADD REPLY
0
Entering edit mode
15 months ago

Hello all,

I have a similar question as the main question...

I am running some statistics for human RNA seq long-read ONT data and want to understand a bit more junction_annotation.py. (RSeQC version 5.0.1).

I understand the junction events and I think the attached figures are quite clear to me. There are more that 95% of the splice-read events that are known. But I don’t get why there are more than 70% of novel splicing junctions in those events.

Maybe, is there a bug in this RSeQC version, as well?

Any help to understand why this could happen is very welcome!

enter image description here

enter image description here

ADD COMMENT

Login before adding your answer.

Traffic: 1938 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6