Dear all great helpers,
I'm very new in miRNA-seq analysis field. With my limited knowledge, I understand that I need to remove rRNA,tRNA,snoRNA, mitocondrial RNA etc from the adapter-trimmed miRNA-seq fastq file prior to aligning to the miRNA database. So I need to gtf file to perform alignment in order to filter out such contaminated sequences (please correct this if I misunderstand something here).
I'm planning to use Gene 'gtf' file from Ensembl. As far as I notice, it contain everything except for tRNA data. My question is as following:
- Is it valid if I simply remove miRNA data from my 'Gene gtf file' (using: grep -wv miRNA) and then append tRNA data transformed from bed to gtf format (using: cat tRNA.gtf Gene.gtf > new.gtf)? In this case, I will have 'new.gtf' which contains any known sequences except known miRNA and unknown miRNA.
- I'm wondering if anyone used to use STAR for filtering out contaminated sequences, and how to suitably set the parameters for such job?
I must beg you all pardon in advance, if I make any mistake here.
Best Regards,
Kaj
Thank you very much. I really feel more convincing to the idea now.