I am trying to perform differential miRNA expression between samples with and without exogenous expression of miRNAs using small RNA-seq data. The exogenous miRNAs are in the top 10 expressed miRNAs per sample and the sequencing depths between samples are different enough to warrant asking this question to make sure I'm doing things correctly.
I first used the miRge3
pipeline to quantify all human miRNAs. I then take the trimmed and collapsed fasta files, expand them, remap the reads against human miRNAs using bowtie1
, take unmapped reads, then finally map my unmapped reads to my exogenous expression constructs. I take the final .bam
file and use featureCounts
to count the exogenous miRNAs. (Side note: I've tried making custom references for miRge3 so I didn't have to do these extra steps and have had 0 luck.)
My plan was to merge the output from featureCounts with the output from miRge3 to create a "master" counts file containing my exogenous miRs and host miRs and let DESeq2 do the size normalization from there.
Is there anything else I should account for? One idea is to use this first pass object to identify some control miRNAs to estimate size factors from, but beyond that I don't have any ideas.
I think concatenating the counts is ok as long as there is no overlap between the two libraries. That said, I would use the actual library size (e.g. how many reads were sequenced per sample), but it should be around the same as the sum of all your counts per sample (at least the proportions should).