I am writing to know if 2-4 percent n content near the both ends of a sequence (RNA seq, read length:150) would be a problem and need trimming?
I am writing to know if 2-4 percent n content near the both ends of a sequence (RNA seq, read length:150) would be a problem and need trimming?
It depends on the aligner that you use. If you're using something like hisat2 that does end-to-end alignment, then you'll want to run things through a trimmer pretty much regardless of how good/bad they look. If you're using an aligner like STAR, that does local alignment, then you don't need to both trimming (if there's a LOT of N content or adapter contamination then you still might want to...or you could just tweak the STAR settings). My general suggestion would then be to use STAR and not usually have to worry about trimming.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
You may also want to find out why those N's are there. It may indicate an overclustered library. Unless all of your samples have the same characteristic there may be some odd batch effect to consider.
I would suggest you to trim it before the alignment. fastp may help you to do that. You can use the function
per read cutting by quality score
. Specify-3
option to enable it on 3' end.It would be better to trim adapters and low-quality tails before alignment or assembly. atria is a comprehensive cutting-edge Illumina trimmer. In your case, you can use
atria --no-adapter-trim -r READ1.fastq [-R READ2.fastq]
to remove N tails and low-quality tails.