Dear colleagues,
I am currently in the process of evaluating miRNA Seq data and would like to present my pipeline for your review. Given the absence of a dedicated bioinformatician in my department, particularly for this specific use case, I am eager to gather feedback on the robustness of this workflow for potential peer review of the project.
Here is an overview of the key steps in my analysis:
1, Utilizing Cutadapt to eliminate any adaptors and the common sequence, which is ligated precisely at the 3' end of the read.
Due to the utilization of a specific miRNA extraction and library kit, I opted to align the data to a miRNA genome obtained from RNA central, using the miRBase data for humans. This choice aims to minimize issues related to multi-mapping leveraging the specificity of the kits used.
Experimenting with various aligners, I found that STAR produced the most satisfactory and acceptable results for me. Consequently, I decided to adhere to STAR, and because of my familiarity and experience with the tool. I fine-tuned the settings to optimize the mapping, achieving a 35-50% unique mapped reads rate, with most of the remaining reads classified as multi-mappers.
- For the subsequent analysis of Differentially Expressed Genes (DEG) using a DESeq2 pipeline, my intention is to focus solely on the uniquely mapped reads to mitigate any potential bias.
Feel free to take bits from this pipeline - mirna_alignment.sh. It mimicks all of the steps in nf-core/smrnaseq and the advice given by Sean Davis in this biostars post
Thank you very much! I see you decided to go for Bowtie as aligner. Do you see any issue in using STAR (as a splice aware reader technically wouldn't be necessary)?
I have never used STAR for smRNA-Seq (no particular reason) so I can't comment, sorry.
I looked around and found that the ENCODE project uses STAR to profile miRNAs. Check out the pipeline page (https://www.encodeproject.org/pipelines/ENCPL337CSA/) it will bring you to a DNAnexus repository where you can find the steps they use. I would consider this a 'gold standard' to compare against.