HI, I have a question here, my reads have over all good per base and per sequence quality but there are three potential problems at the same time after QC. These are per base sequence content (the first 10bp bases are unbalanced), a 50bp overrepresented sequences and Kmer content is bad.
So do I need to remove the first 10bp bases first and then do trim 50bp overrepresented?
I have tried to move remove the first 10bp unbalanced bases and I found the QC did not show overrepresented sequence anymore. However, my Kmer content report looks more mess...
So now, I changed to trim 50bp overrepresent sequence, however, I got a variety of length ( from 0 to 88bp) of reads by using cutadapt software. what I need to do next? continue to trim the first 10bp? or ...?
Hi Devon, I remember that you discussed that in RNA-Seq, we only need to do 'gentle' trimming (e.g. only remove the adapters). However, how about de-novo assembly RNA-Seq? After FASTQC control, I found 'overrepresented sequence' shows that there are 0.2% TruSeq Adapters index 27 , and some other sequence (all of those <0.5%) . Do I need to (1) trim all of these overrepresented sequences,(2) just remove the adapters and (3) leave them? Thank you!
You would want to trim all extraneous sequence (as far as you can recognize it) for any type of NGS analysis, more so for any "de novo" work.
In reference genome based RNA-Seq, I read some papers said we need to do 'gentle' trimming (i.e. only remove the adapters). But I am not sure in de novo assembly is it same situation.
I can't find the exact tweet at the moment, but Titus Brown happened to tweet about this recently and his recommendation for de novo assembly is to trim adapters but nothing else. I tend to go with him on assembly related questions, since this is really not my forte.
Hi Devon, do you have a source for "There's no need to trim these bases off (eg. the ~10bp at the 5' end) , they won't actually bias mapping".
https://sequencing.qcfail.com/articles/positional-sequence-bias-in-random-primed-libraries/