Question

Small RNA sequencing using Illumina 2 channel SBS: how to deal with Gs?

1

Entering edit mode

18 months ago

antoinefelden ▴ 60

I'm working on a small RNA sequencing experiment (150 PE on NovaSeq 6000), and many reads look like this when the fragment size is smaller than 150 bp, with Gs completing the sequence up to 150:

@A00312:445:H3K2MDSX7:4:2322:15519:9455 1:N:0:GCCAAT CCTGGGGATAAACTGTAGGCACCATCAATACCCAACGTTCAAGATCGGAAGAGCACACGTCTGAACTCCAGTCACGCCAATATCGCGTATGCCGTCTTCTGCTTGAAAAGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG + FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFF:F,FFFFFFFFFFFFFF,FFFFFF:F,,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

Although it is a feature of small RNA sequencing using this method (see attached picture), I surprisingly can't find any option in classic softwares (trimmomatic, cutadapt etc) to automatically trim these Gs. Anyone has a lead, or alternatively a creative code to do that? I'm working with fastq files, so a suboptimal simple-ish method could be to turn them into simple fasta and trim all the repeated Gs but I feel like there must be a program out there already doing that with fastq files. I'm also not entirely sure on how to code that in an accurate way, e.g. after how many Gs is it appropriate to trim, etc.

Thanks Hive mind!

enter image description here

SBS small RNA • 1.1k views

ADD COMMENT • link updated 18 months ago by size_t ▴ 120 • written 18 months ago by antoinefelden ▴ 60

score 1 · Answer 1 · 2023-10-24

BBDuk has a "trimpolyg" flag. Or better yet, a "trimpolygright" flag which is what you probably want. However, after adapter-trimming, the poly-G should disappear anyway since it comes after the adapter sequence. So I'd do something like:

bbduk.sh in=r1.fq in2=r2.fq out1=trimmed1.fq out2=trimmed2.fq ref=adapters ktrim=r k=21 hdist=1 mink=9 tbo tpe

...and you can add "trimpolygright=6" to that if you want to, but then you would end up with reads where the adapter was not recognized having their poly-G tails removed, resulting in a small RNA plus an adapter containing sequencing errors, and that's not usually very helpful.... but, the option is there.

If the included adapter sequences (referenced by setting "ref=adapters") don't match, which is unlikely but possible, you can run:

bbmerge.sh in=r1.fq in2=r2.fq outa=found_adapters.fa

...to get the actual adapter sequences in your library.

score 1 · Answer 2 · 2023-10-24

1

Entering edit mode

18 months ago

size_t ▴ 120

try this tool: fastp with option --trim_poly_g

force polyG tail trimming, by default trimming is automatically enabled for Illumina NextSeq/NovaSeq data

ADD COMMENT • link 18 months ago by size_t ▴ 120