Question

small RNA-seq length distribution of 100nt?

0

Entering edit mode

6 months ago

karlaarz ▴ 110

Hi!

I am analysing small RNA-seq data. After trimming, the QC of each sample is worse than before the trimming. One thing that caught my attention was the length distribution. The raw fastq files length distribution is 101nt, but I don't think it's correct as the sequence length doesn't correspond to miRNA length (approximately 15-25nt).

I'm using fastp to remove the adapters, but it doesn't detect any adapter in the raw fastq files.

Has anyone analysed small RNA-seq from which the raw files have a length distribution of 101nt?

What are your thoughts?

Thanks!

small-rnaseq fastp mirnas • 625 views

ADD COMMENT • link updated 5 months ago by GenoMax 147k • written 6 months ago by karlaarz ▴ 110

0

Entering edit mode

How many cycles did you run? Did you measure the RNA length (the actual molecules, not the reads)?

ADD REPLY • link 6 months ago by Asaf 10k

0

Entering edit mode

They performed 50 cycles and used the NovaSeq 6000 platform. I'd asked about the RNA length as I don't have that info yet.

ADD REPLY • link 6 months ago by karlaarz ▴ 110

0

Entering edit mode

So you should have a length of 50 for R1 and 50 for R2, how comes you have 101? Can you paste a few lines from your fastq files?

ADD REPLY • link 6 months ago by Asaf 10k

0

Entering edit mode

The data is single-end, so I only have one file per sample. Here are some lines of one fastq:

@A00560:364:HW3VLDRX3:2:2101:2410:1000 1:N:0:CAGCGT
GTAAATGATGAGATTCCATTGGTCCGTGTTTCTGAACTACATGATTTTCCTTGGCTATTCTGATATGGAATTCTCGGGTGCCAAGGAACTCCAGTCACCAG
+
FF,FFFFF,FFFFFFF,FFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFF,F:FFFF:,FFFFFFFFFFFF:F:FF,FFF:F:,F:FFF::F
@A00560:364:HW3VLDRX3:2:2101:4779:1000 1:N:0:CAGCGT
TGCTGTGATGAGATGACTAAGTAGGAAGTGCCGTCAGAGTCGATAACAGACGATAACAGCTCCTGGCTGACTTGGAATTATCGGGTGCCAAGGAACTCCAG
+
FFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFF,FF:,,F,,FFFFF,:F,FF:,FF,:FF,F,,FF,F,F,,FFFF,F:FFF:FF,FF

ADD REPLY • link 5 months ago by karlaarz ▴ 110

0

Entering edit mode

With smallRNA seq it is always better to find out the adapter used for making libraries based on kit. Then use that to trim the reads to remove all sequence 3' of (and including) the adapter (assuming the library prep used adapter ligation). This should leave cleaned reads of the correct size 20-26 bp, if all has gone well with the libraries. Only the first read should be enough to do this.

ADD REPLY • link 6 months ago by GenoMax 147k

0

Entering edit mode

Yes, I did that but even after removing the adapter, the samples still have a sequence length between 17-100nt and the Sequence Length Distribution have several peaks from 17-23, 64 - 77 and > 100nt.

ADD REPLY • link 5 months ago by karlaarz ▴ 110

0

Entering edit mode

Ideally any reads that did not contain the smallRNA adapter should be discarded.

ADD REPLY • link 5 months ago by GenoMax 147k