Hello,
I have miRNA short read data for few samples. I want to do differential analysis to find out if the samples show differences in the expression level of any miRNA. I know there are few tools available for this but still I carried out most of the steps manually. This is what i did:
- Selected reads that range between 18-32 bp and aligned them against reference genome and only kept the ones that got aligned.
- Filtered reads that got aligned to database of small RNA other than miRNA.
- Aligned them against the miRBase using SHRiMP2 using the miRNA mode. I used precursor microRNA database. I am not sure why I did this but somewhere I read that is advisable to align short reads against precursor miRNA rather than mature miRNA. Please free to comment about this step. I may not be right.
Now, I have the SAM files which i can use to quantify the expression of different miRNAs. But as different samples have different number of starting reads I need to normalize the counts. I can use a modified RPKM value where I can ignore the length factor and only use total aligned reads for a sample. My first question is should I use a) total number of reads aligned for a sample including non-miRNA short RNAs, miRNAs to normalize. b) OR total number of reads aligned to miRNA database to normalize the counts for a miRNA.
Also, in case if you have a better idea for differential expression analysis I would appreciate it.
Yup I read your Refseq question. Ultimately, I ended up using total reads that mapped to miRNA.