Hello,
I am trying to obtain sequence files to align my ribo-seq fastqs to (so that I can filter out unwanted tRNA and rRNA). I believe the industry standard is using SILVA's db, but to a new user there is 0 helpful info on the site explaining which files to download or where they even are... Could someone please point me in the right direction?
Thank you.
What organism do you need this information for?
I need it for human Hg38!
You can find the rDNA repeat information in how can i download human ribosomal reference ?
You should be able to use BioMart to get the tRNA sequences as well.
Thanks for sending this but it actually looks like ensembl does not annotate tRNA genes https://support.bioconductor.org/p/66192/. I couldn't find it through the filters either.
I also feel I need a dedicated tRNA file (analogous to the ribosomal repeat file linked by GenoMax in the answer you linked). The reason is when I aligned my ribo-seq reads to ensembl annotated rRNA genes, barely anything aligned. When I used the ribosomal repeat .fa file, I had significant alignment. I feel it must be analogous to tRNA.
tRNA population in RNAseq is likely going to be much smaller than rRNA which can be upwards of 95% of cellular RNA.
If you must get a tRNA file then you can download the sequences from https://rnacentral.org/search?q=homo%20sapiens%20tRNA and create a non-redundant set.