I'm working on a small RNA sequencing experiment (150 PE on NovaSeq 6000), and many reads look like this when the fragment size is smaller than 150 bp, with Gs completing the sequence up to 150:
@A00312:445:H3K2MDSX7:4:2322:15519:9455 1:N:0:GCCAAT CCTGGGGATAAACTGTAGGCACCATCAATACCCAACGTTCAAGATCGGAAGAGCACACGTCTGAACTCCAGTCACGCCAATATCGCGTATGCCGTCTTCTGCTTGAAAAGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG + FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFF:F,FFFFFFFFFFFFFF,FFFFFF:F,,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
Although it is a feature of small RNA sequencing using this method (see attached picture), I surprisingly can't find any option in classic softwares (trimmomatic, cutadapt etc) to automatically trim these Gs. Anyone has a lead, or alternatively a creative code to do that? I'm working with fastq files, so a suboptimal simple-ish method could be to turn them into simple fasta and trim all the repeated Gs but I feel like there must be a program out there already doing that with fastq files. I'm also not entirely sure on how to code that in an accurate way, e.g. after how many Gs is it appropriate to trim, etc.
Thanks Hive mind!