Dear all, how could I know proportion of different genomic features for my RNA sequence data? like I want to know that what %age of my data falls in exonic, intronic, intergenic and splice junction regions? I have multiple analysis files that looks like this:
1) genome annotation file that give information about mRNA and CDS:
chr4 GLEAN mRNA 123284514 123288477 0.999991 - . ID=Cotton_A_18927_BGI-A2_v1.0;Name=Cotton_A_18927;source_id=CottonA_GLEAN_10022228;identical_support_id=CUFF67.1103.1;evid_id=Cot030308.1
chr4 GLEAN CDS 123288376 123288477 . - 0 Parent=Cotton_A_18927_BGI-A2_v1.0
chr4 GLEAN CDS 123287662 123287826 . - 0 Parent=Cotton_A_18927_BGI-A2_v1.0
2) a splice junction determination file that gives information for the doner and accepter splice site coordinates:
chr1 329728 329839 -
chr1 330066 330757 -
chr1 581256 581357
3) a transcriptome assembly that give information about exons and transcript:
chr1 StringTie transcript 328635 330943 1000 - . "gene_id ""STRG.1""" " transcript_id ""STRG.1.1""" " cov ""70.023491""" " FPKM ""28.098141""" " TPM ""24.855738"""
chr1 StringTie exon 328635 329729 1000 - . "gene_id ""STRG.1""" " transcript_id ""STRG.1.1""" " exon_number ""1""" " cov ""88.673470"""
chr1 StringTie exon 329840 330067 1000 - . "gene_id ""STRG.1""" " transcript_id ""STRG.1.1""" " exon_number ""2""" " cov ""22.850203"""
chr1 StringTie exon 330758 330943 1000 - . "gene_id ""STRG.1""" " transcript_id ""STRG.1.1""" " exon_number ""3""" " cov ""18.054590"""
needs kind help.
thank you so much
(Moved to an answer because seems to solve the question of OP)
thank you so much for the response. I have read read_distribution.py from RSeQC. if I provide it with alignment.sam file and gene_annotation.bed it will give me the read distribution only for the TSS TES exon etc but will not provide my any information that what %age of data consists of splice junctions.
let me repharase what I need: 1) %gae of SJ, % of intron, % of inergenic, % exon
2) what % of splice junction resides in: 5'UTR, 3'UTR-CDS, CDS, CDS-3'UTR, 3'UTR,
kindly guid me in this regard
thank you so much