Junction Saturation for AS analysis
0
0
Entering edit mode
2 days ago

Hi,

I am using RSeQC to assess the quality of my ONT long-read RNAseq data, specifically the junction.saturation module.

I have turned the ensembl gene annotations gtf file into a bed format....

  • Homo_sapiens.GRCh38.112.gtf
  • Homo_sapiens.GRCh38.112.bed

I've run the junction.saturation module.... first line of output was:

reading reference bed file: /Users/mattmorgan/Documents/RNAseq/Cam_Oct/Homo_sapiens.GRCh38.112.bed  ... Done! Total 404168 known splicing junctions

Then, the last line for this specific bam file;

sampling 100% (5320705) splicing reads. 145545 splicing junctions. 57335 known splicing junctions. 88210 novel splicing junctions.

enter image description here

Number of known junctions look like they are starting to plateau at high % of total reads, with the number of maximal junctions in this file is trending towards ~ 60,000.

Given that in the instructions they say:

All (annotated) splice junctions should be rediscovered from saturated RNA-seq data

and the number of junctions likely discovered in this file is significantly lower, does this represent a problem with this sequencing file?

Or is it just that not all transcripts are expressed at all / at high enough numbers and so the '404168' known splice junctions is a theoretical maximum but way higher than what you would actually see? Or am I missing something else?

Thanks!

ONT splice-junctions • 134 views
ADD COMMENT

Login before adding your answer.

Traffic: 1596 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6