I have a total RNA-seq dataset. I analyzed it using the following steps:
- Did fastqc; trimmed the adapters as I found the contamination above 10%
- Used star for the alignment; gencode GRCh38.p14 gtf and fasta files were used
- got the RNA metrics using the
CollectRnaSeqMetrics
from picard and rnaseqc2
I used the following approach and everything looked good. The rRNA rates were identical (and below < 0.5 %) using both rnaseqc2 and picard. However, I came across an interesting tool called fastqc_screen and to my surprise/shock, I found that the rRNA rate to be ~30% when I ran it on the fastq files.
I think the only reason why I saw the difference is because of the absence of some of the rRNA annotations. I did some comparison of the gtf files downloaded from ncbi, ucsc and gencode. I found some of the rRNA annotations were missing in the gtf files of gencode/ensembl, including RNA45SN2, RNA28SN1 RNA45SN3. When I did more search, I found that only rRNA for 5S were present in genocde and there were no annotations (even aliases) for 18S, 28S, 45S. However, it is present in the ncbi/ucsc annotations gtf file (the version of the gtf are identical).
Can someone tell me the reason behind it? I always thought the gencode annotations are the most comprehensive one. Here are the screenshots of my comparisons:
Location and copy number of rDNA repeat for humans is an interesting problem that is not completely solved (see how can i download human ribosomal reference ? ).
Latest T2T genome is likely a better source for complete rRNA annotations where you will find other
*S rRNA
annotated ( https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/914/755/GCF_009914755.1_T2T-CHM13v2.0/GCF_009914755.1_T2T-CHM13v2.0_genomic.gtf.gz ).What "genome" did you use with
fastq_screen
? To be sure that your data does or does not contain rRNA you could use the sequence of the rDNA repeat (from biostars post linked above) and then usebbduk.sh
from BBMap suite in filter mode to see how many reads match.I used default parameters. thank you for the suggestions. I have another question. Do you know any tools that convert the NCBI gtf to Gencode gif? Thank you!
Not sure what you are asking. Can you clarify?
Because the rRNA information in the NCBI gtf file but not in GENOCDE.
You may be able to use the NCBI GTF with
featureCounts
andSTAR
BAM.