Question

miRNA-seq: QC reports and workflow

2

Entering edit mode

4 months ago

omicon ▴ 40

I’m new to miRNA-seq data processing, and despite doing a lot of research and consulting with colleagues and professors, I still have many doubts about whether my workflow is correct. This uncertainty in the preprocessing step is making it difficult for me to move forward with confidence.

Currently, I’m working on a miRNA-seq experiment performed with ION TORRENT using single-end (SE) sequencing. In theory, the experiment includes 6 samples, but when I downloaded the data from SRA, I noticed there are 12 SRR instead of 6. This means that each SAMN has two SRR. I understand that, according to quality control (QC) guidelines, these two runs from the same library can be merged before analysis, but I don’t fully understand the reason behind this.

Additionally, I noticed that in the pre-trimming QC report, the sequence lengths range from 1 to 152 bp, which seems too broad. Currently, I am using Trimmomatic with the following parameters:

SLIDINGWINDOW:4:20 LEADING:20 TRAILING:20 CROP:35 MINLEN:18

In the FastQC report on overrepresented sequences, I found hits for the ABI Solid3 Adapter B, but with different sequence lengths:

ABI Solid3 Adapter B (100% over 14bp)
ABI Solid3 Adapter B (95% over 21bp)
ABI Solid3 Adapter B (100% over 23bp)

Even though all these sequences match the same adapter, their base composition and length vary. This makes me uncertain about how to properly identify and remove adapters during preprocessing.

Is it correct to use CROP in this case, or should I first remove adapters with Cutadapt and then trim low-quality sequences with Trimmomatic?
Is it better to use SLIDINGWINDOW instead of AVGQUAL for quality filtering?
How can I correctly determine which adapters to remove if each sample has a different overrepresented sequence?

I think I'm doing a lot of things wrong :( but I try not to get overwhelmed haha. I would really appreciate any guidance on these points, as I want to ensure proper preprocessing before moving on to further analysis.

miRNA-seq ION-Torrent RNA-seq FASTQC • 590 views

ADD COMMENT • link 4 months ago by omicon ▴ 40

score 0 · Answer 1 · 2025-03-17

0

Entering edit mode

4 months ago

GenoMax 152k

Since you have publications associated with this data (based on your last thread and a question before that) don't try to do this analysis by following standard NGS data analysis steps.

miRNA's generally require a kit specific adapter that may not be identifiable using programs like FastQC. From the paper you had included in the first thread (ref: miRNAs - Adapters, adapters, adapters i´m so confused ) there is a clear section on the kit used and how the data was analyzed: https://pmc.ncbi.nlm.nih.gov/articles/PMC7655837/ (BTW the sequencing there was done using Illumina not Ion torrent). Second paper (https://pmc.ncbi.nlm.nih.gov/articles/PMC7034510/ ) contains Ion torrent data.

So depending on which data you are using you will need to process things differently.

ADD COMMENT • link 4 months ago by GenoMax 152k

0

Entering edit mode

GenoMax, thank you for your answer. You are correct; the ION-Torrent data comes from the second paper you mentioned, but it was not sequenced with Illumina.

miRNA-seq =Ion Proton (Ion Total RNA-Seq Kit v2.0).
mRNA-seq = Illumina Xten (VAHT Total RNA-seq Kit).

I have searched the Thermo Fisher website and the user guide for the kit, but I could not find specific information about the adapters; they only mention adapters (P1 and A). The only clear information I have found is that the Ion Torrent suite is very effective at removing adapters during sequencing (which makes me think that, in theory, I should not be seeing adapters). However, the article does not mention whether they used this software.

ADD REPLY • link 4 months ago by omicon ▴ 40