Entering edit mode
6 months ago
karlaarz
▴
110
Hi!
I am analysing small RNA-seq data. After trimming, the QC of each sample is worse than before the trimming. One thing that caught my attention was the length distribution. The raw fastq files length distribution is 101nt, but I don't think it's correct as the sequence length doesn't correspond to miRNA length (approximately 15-25nt).
I'm using fastp to remove the adapters, but it doesn't detect any adapter in the raw fastq files.
Has anyone analysed small RNA-seq from which the raw files have a length distribution of 101nt?
What are your thoughts?
Thanks!
How many cycles did you run? Did you measure the RNA length (the actual molecules, not the reads)?
They performed 50 cycles and used the NovaSeq 6000 platform. I'd asked about the RNA length as I don't have that info yet.
So you should have a length of 50 for R1 and 50 for R2, how comes you have 101? Can you paste a few lines from your fastq files?
The data is single-end, so I only have one file per sample. Here are some lines of one fastq:
With smallRNA seq it is always better to find out the adapter used for making libraries based on kit. Then use that to trim the reads to remove all sequence 3' of (and including) the adapter (assuming the library prep used adapter ligation). This should leave cleaned reads of the correct size 20-26 bp, if all has gone well with the libraries. Only the first read should be enough to do this.
Yes, I did that but even after removing the adapter, the samples still have a sequence length between 17-100nt and the Sequence Length Distribution have several peaks from 17-23, 64 - 77 and > 100nt.
Ideally any reads that did not contain the smallRNA adapter should be discarded.