Entering edit mode
8.4 years ago
pixie@bioinfo
★
1.5k
Hello,
The reference build on which I am currently working did not provide the FASTA files for rRNA and tRNA. However, they have provided the GTF files. How can I use them to remove the sequences and which software should I use for my RNA-seq data?
You strictly don't need to remove the sequences. Just ignore those features when you do your gene counts or before you do the DE analysis.
Take a look at this for software suggestions: (Modern, mid-2016) RNA-seq software pipeline
Thank You, yea this looks simple
What are your goals and what species are you using? I've tried prefiltering by mapping against the reference rRNA cassette for mouse/human and a good chunk of rRNA reads remain for things like RiboSeq. I usually deal with that by creating blacklist regions, but if you just need to do normal DE analysis then you can just ignore these regions.
Hi Devon, can you elaborate on how you came about your reference rRNA cassette? I too have been grappling with rRNA contamination within my samples, there are some where depletion was not as effective as the rest.
What I have done in my work is to take the GTF from repeatmasker annotations of rRNA as well as the fasta sequences from this annotation. I too, find rRNA reads remain with my pipeline (taking the GTF file and using split_bam.py in RNA-SeQC)
You can find the human rDNA repeat sequence in this post: entire human rDNA
Mouse rDNA repeat can be found here.
Thank you genomax, don't know how I missed this annotation (well to be honest wasn't aware of it) I will add this to my rRNA reference and try again.
Cheers.
Thanks for the suggestions. I am new to this. I am working on rice and would be interested in 1) DE analysis 2) Co-expression Networks. I plan to carry out the following pre-processing steps before I go for DE:
1) Remove low quality bases from 5' and 3' end 2) Remove rRNA and tRNA sequences 3) Remove bases that are shorter that 20 bp Does this look fine ?
For that you can skip (2).