What is the best resource for other RNA sequences if our aim is to omit Other RNA(if any) contaminants in RNA-Seq Data. This Query seems discussed in different threads, but currently looking for a summarized solution of all such.
a) Select all OtherRNAs from NCBI of higher taxonomy filter For ex if sample is plant: Generate OtherRNA db from NCBI all[filter] AND "green plants"[porgn] AND (biomol_trna[PROP] OR biomol_snorna[PROP] OR biomol_snrna[PROP] OR biomol_rrna[PROP] OR biomol_scrna[PROP] OR biomol_crna[PROP])
b) Generate RNA db from specialized databases like silva,greengenes etc. But I think many of them harbour only rRNA sequences.
c) rfam db
d) riboPicker
pardon if query is not clear / is irrelevant
Why not focus on what you want?, instead of focusing on what not.
Can't you just align your reads to the genome, and use only these that align to your 'real' genes? I assume you want the coding genes only?
Good Thought there. I appreciate. I am also concerned in de-novo experiments where we lack a reference genome.
Why not first find out what is contaminating your data. Do a denovo transcript assembly, plot GC content of transcripts, see if you get multiple peaks. You can also blast your transcripts and find out if there are obvious contaminants.