rRNA in gencode gtf file and rRNA rate in the RNA-seq dataset
0
0
Entering edit mode
3 months ago
ATRX ★ 1.1k

I have a total RNA-seq dataset. I analyzed it using the following steps:

  1. Did fastqc; trimmed the adapters as I found the contamination above 10%
  2. Used star for the alignment; gencode GRCh38.p14 gtf and fasta files were used
  3. got the RNA metrics using the CollectRnaSeqMetrics from picard and rnaseqc2

I used the following approach and everything looked good. The rRNA rates were identical (and below < 0.5 %) using both rnaseqc2 and picard. However, I came across an interesting tool called fastqc_screen and to my surprise/shock, I found that the rRNA rate to be ~30% when I ran it on the fastq files.

I think the only reason why I saw the difference is because of the absence of some of the rRNA annotations. I did some comparison of the gtf files downloaded from ncbi, ucsc and gencode. I found some of the rRNA annotations were missing in the gtf files of gencode/ensembl, including RNA45SN2, RNA28SN1 RNA45SN3. When I did more search, I found that only rRNA for 5S were present in genocde and there were no annotations (even aliases) for 18S, 28S, 45S. However, it is present in the ncbi/ucsc annotations gtf file (the version of the gtf are identical).

Can someone tell me the reason behind it? I always thought the gencode annotations are the most comprehensive one. Here are the screenshots of my comparisons:

enter image description here enter image description here enter image description here

RNA-seq gencode rRNA • 574 views
ADD COMMENT
1
Entering edit mode

Location and copy number of rDNA repeat for humans is an interesting problem that is not completely solved (see how can i download human ribosomal reference ? ).

Latest T2T genome is likely a better source for complete rRNA annotations where you will find other *S rRNA annotated ( https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/914/755/GCF_009914755.1_T2T-CHM13v2.0/GCF_009914755.1_T2T-CHM13v2.0_genomic.gtf.gz ).

What "genome" did you use with fastq_screen? To be sure that your data does or does not contain rRNA you could use the sequence of the rDNA repeat (from biostars post linked above) and then use bbduk.sh from BBMap suite in filter mode to see how many reads match.

ADD REPLY
0
Entering edit mode

I used default parameters. thank you for the suggestions. I have another question. Do you know any tools that convert the NCBI gtf to Gencode gif? Thank you!

ADD REPLY
0
Entering edit mode

convert the NCBI gtf to Gencode gif

Not sure what you are asking. Can you clarify?

ADD REPLY
0
Entering edit mode

Because the rRNA information in the NCBI gtf file but not in GENOCDE.

ADD REPLY
0
Entering edit mode

You may be able to use the NCBI GTF with featureCounts and STAR BAM.

ADD REPLY

Login before adding your answer.

Traffic: 2152 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6