Highly mapped to introns
1
1
Entering edit mode
3.2 years ago
Gotumbtai ▴ 10

Hi,

I am analyzing RNA-seq data from human blood samples. I checked the read distribution using RSeQC read_distribution after mapping by STAR. Usually, I get more than 80% of reads mapped to exons. However, at this time, the result showed only several % were mapped to exons, even though the STAR outputs showed more than 90% were uniquely mapped. I am wondering if this result was correct or my setting for the RSeQC was wrong.

The command I used: read_distribution.py -i my.bam -r hg19_Ensembl_gene.bed

The bam files were output from STAR and sorted by samtools. the bed file was downloaded from https://sourceforge.net/projects/rseqc/files/BED/Human_Homo_sapiens/ The reference genome for mapping was Homo_sapiens.GRCh38.dna.primary_assembly.fa and the annotation file was Homo_sapiens.GRCh38.104.gtf

One of the output from RSeQC was below: enter image description here

The multiqc image was below: enter image description here

Thank you for your help!!

distribution read RSeQC RNA-seq exon intron • 1.3k views
ADD COMMENT
0
Entering edit mode

Do you know the library type that was used? E.g. if this is total RNA (not poly-A enriched), then you may simply have lots of immature transcripts and non-coding transcripts. There also seems to be a bit of genomic DNA contamination ("other_intergenic").

ADD REPLY
1
Entering edit mode
3.2 years ago

I think your problem is that your bed file doesn't match the genome/gtf you used. I think it's too old. My $gtf is the version 104 one like yours.

zcat hg19_Ensembl_gene.bed.gz | head
chr1    **66999065**        67210057        **ENST00000237247** 0       +       67000041        67208778        0       27      25,123,64,25,84,57,55,176,12,12,25,52,86,93,75,501,81,128,127,60,112,156,133,203,65,165,1302,   0,863,92464,99687,100697,106394,109427,110161,127130,134147,137612,138561,139898,143621,146295,148486,150724,155765,156807,162051,185911,195881,200365,205952,207275,207889,209690,

grep ENST00000237247 $gtf
1       havana  transcript      **66533383**        66744374        .       +       .       gene_id "ENSG00000118473"; gene_version "23"; transcript_id "**ENST00000237247**"; transcript_version "10"; gene_name "SGIP1"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "SGIP1-201"; transcript_source "havana"; transcript_biotype "protein_coding"; tag "basic"; transcript_support_level "5";

1       havana  exon    **66533383**        66533407        .       +       .       gene_id "ENSG00000118473"; gene_version "23"; transcript_id "ENST00000237247"; transcript_version "10"; exon_number "1"; gene_name "SGIP1"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "SGIP1-201"; transcript_source "havana"; transcript_biotype "protein_coding"; exon_id "ENSE00001454196"; exon_version "1"; tag "basic"; transcript_support_level "5";
ADD COMMENT

Login before adding your answer.

Traffic: 1505 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6