Entering edit mode
9.2 years ago
espop23
▴
60
I have a file of the intersecting regions of TSS and rRNA. How do I find the percentage of reads contaminated by rRNA?
I have a file of the intersecting regions of TSS and rRNA. How do I find the percentage of reads contaminated by rRNA?
Hi,
You can:
In more detail:
The fasta file will look like:
head Homo_sapiens.GRCh38.ncrna.fa
>ENST00000629478 proj_ncrna:known chromosome:GRCh38:CHR_HG1832_PATCH:210374154:210374267:-1 gene:ENSG00000281499 gene_biotype:snRNA transcript_biotype:snRNA
ACACTGGTTTCTCTTCAGATCGAATAAATCTTTCGCCTTTTACTAAAGATTTCCGTGGAG
AGAAACAAATCAGTTATAAGCTAATTTTTTGTAAGCCTTGCCCTGGGGAGGCAG
>ENST00000516494 proj_ncrna:known chromosome:GRCh38:CHR_HG2128_PATCH:67546651:67546754:1 gene:ENSG00000252303 gene_biotype:snRNA transcript_biotype:snRNA
GTGCTCACTTTGGCAACATACATACTAAAATTGGACGGATACAGACATAAACATGGCCCC
TGCACAAGGATGACATGCAAATTCATGAAGCATTCCATATTTTT
And the GTF file:
head rRNA_Homo_sapiens.GRCh38.81.gtf
1 ensembl gene 9437669 9437778 . - . gene_id "ENSG00000252956"; gene_version "1"; gene_name "RNA5SP40"; gene_source "ensembl"; gene_biotype "rRNA";
1 ensembl gene 13623184 13623284 . - . gene_id "ENSG00000222952"; gene_version "1"; gene_name "RNA5SP41"; gene_source "ensembl"; gene_biotype "rRNA";
1 ensembl gene 34112949 34113063 . + . gene_id "ENSG00000201148"; gene_version "1"; gene_name "RNA5SP42"; gene_source "ensembl"; gene_biotype "rRNA";
1 ensembl gene 37264677 37264786 . - . gene_id "ENSG00000252368"; gene_version "1"; gene_name "RNA5SP43"; gene_source "ensembl"; gene_biotype "rRNA";
You can parse these files and make a rRNA reference genome fasta which will look like:
>RNA5SP40|ENSG00000252956
GTCTATGGCCATTGCACCCTGAACGTGCCAGATCTTGTCTCATCTTGGAAGCTAAGCAGGGTTGGGCTTGGAGGGGAGGAGGGTGAACCTCAGTTCAGGTTACTTAGCCT
>RNA5SP41|ENSG00000222952
GCCTACGGCCATACCATTCTGGATGCGTCTCAGAAGCTAAGCAGGGTCAGACCTGGCTGGTACTTGGATGGGAGTATATCAGCCACTGGGTGCTGTGGTGC
>RNA5SP42|ENSG00000201148
Then map your reads to this genome and you should get it.
Another way is to map your reads to the genome and use the gtf file to annotate it and find the percentage of reads mapped to the rRNA. Both works fine with minute differences.
I hope this helps.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Dear Gjain,
I aligned my reads using tophat with human reference from UCSC. Then I downloaded the gtf file from UCSC and annotated the reads using gtf file.
How do I find the percentage of reads mapped to rRNA.
I am searching for human 28S, 18S, 5.8S and 5S rRNA (RNA28S, RNA18S, RNA5-8S and RNA5S genes) and also 12S and 16S mitochondrial rRNA (MT-RNR1 and MT-RNR2) in human_genes.gtf file from UCSC.
I am unable to find these genes(RNA28S, RNA18S, RNA5-8S, RNA5S, MT-RNR1 and MT-RNR2 genes) in human_genes.gtf file. I am looking at gene_id or gene_name in gtf file.
How do I check these genes in gtf file or in any other resources.
Is it saved in some other name?