I am looking for recommendations on how to trim miRNA/smallRNA sequencing data because the trimming may affect the final results (differences are not enormous, but some miRNA are more prone to different parameters in trimming step).
For sequence data trim, I use cutadapt, with minimal sequence length 15, sequence end quality trim Phred20 before adaptor removal and error rate in adaptor detection 0.1 (cutadapt -m 15 -q 20 -e 0.1). The more stringent parameters with (q30 error rate 0.01) give less mature miRNA (logically), but higher differences in DE in downstream analysis.
It is known that miRNA need less stringent parameters due to more sequencing noise compared to other RNA and DNA data, so I would be grateful for sharing your experience.
I work with small RNA data. I think the value of m is very short especially if you are only looking into miRNAs. We typically have a min cutoff of 18 and a maximum cutoff of 34. Otherwise, the q and e values that you used are reasonable. Can you share some references talking about the need for miRNA data processing to use less stringent parameters.
Thank you for your reply. The reported and recommended miRNA length may differ and some miRNA analysis tools sort aligned reads to mature miRNA, isomiRNA and miRNA hairpins, so the shorter length should not be a problem for downstream analysis. http://bioinfo2.ugr.es/presentaciones/biocomputacion/microRNA_NGS.pdf (prof. Hackenberg's presentation)
As you have probably seen that FastQC of your data differ compared to other longer reads experiments - that is the result of a difference in length of your smallRNA-library products, remaining adaptor dimers and low diversity of library due to highly expressed miRNA in your samples.
Please find some publications with Q20 trimming before further processing. https://www.ncbi.nlm.nih.gov/pubmed/28934507 https://www.ncbi.nlm.nih.gov/pubmed/26027894