Entering edit mode
4.0 years ago
Pac314
▴
10
I am performing quality control on some reads that have good FastQC metrics apart from duplication levels and per base sequence content. Is it necessary to trim these reads before alignment and downstream DE analysis, variant calling etc? I have read conflicting advice about trimming before RNAseq analysis and not sure what to do.
Most aligners will soft-clip the reads removing sequence that does not match so trimming is not strictly necessary. If you are going to do
de novo
assembly then you should scan/trim the data to remove these extraneous sequences.Ok, as I am not performing a de novo assembly, I will perform an alignment using the untrimmed reads first.
I don't perform trimming at all but I do apply some filters for genes with really low read counts. Also, a results validation helps a lot! But, you can perform some tests and compare results with trimmed and not trimmed data, this will give you a really good direction on what to do!
I have just checked the per tile sequence quality has a warning in a couple of my fastqc files along with sequence duplication and per base sequence content, but only a few tiles are yellow, is it safe to ignore these warnings? When I run trimmomatic on these reads the per tile sequence quality passes, but then I get warnings for sequence length distribution and sequence duplication levels.
Please check these informative blog posts by authors of FastQC:
https://sequencing.qcfail.com/articles/libraries-can-contain-technical-duplication/
https://sequencing.qcfail.com/articles/positional-sequence-bias-in-random-primed-libraries/
Thank you for sharing these articles, they are really helpful!
How do you filter to remove low read counts before DE analysis?
See: https://support.bioconductor.org/p/65256/
https://support.bioconductor.org/p/110833/
I like filterByExpr from edgeR for most analysis.
I personally use
salmon
for alignment/quantification and it was generally beneficial to trim adapters to maximize alignment rate, but only if you in fact have adapter contamination according tofastqc
. If not or low (like 1% or so) then no, probably not necessary or beneficial.Is adapter contamination inferred from the presence of over-represented sequences? I don't have any over-represented sequences.
Not sure about the exact procedure internally but there is a slot "adapter content" that lights up if you have contamination above a certain percentage.
FastQC
has 2 files containing adapter and contamination sequences: adapter_list.txt and contaminant_list.txt. You can go to ~/[YOUR_WORKSPACE]/FastQC/Configuration to check them.