Hello,
The software I am using for demultiplexing allows to specify adapters in the samplesheet so that the adapters can be trimmed. However, I noticed that when I did multiqc on my fastq files there are adapters left. I am including some examples below. In this example the adapter content is pretty low but I have had other samples with 3-7% adapters on the multiqc file.
When I specify the adapter exactly as Illumina says on their manual the adapter can reach up to 0.74%. The software also allows for automatic detection of the adapters. When I use that the adapter sequence that is picked up is the illumina adapter sequence+ 5 more bases. When I use that for adapter trimming multiqc says: No samples found with any adapter contamination > 0.1%
I wanted to know what would be the more appropriate way to trim the adapters? Would the extra 5 bases cause an issue?
Thank you
Thank you. I am planning to align the data with kallisto. Does that make a difference?
No it should not make a difference.
Thank you. I jut want to confirm. By not making a difference do you mean that trimming the extra 5 bases should not make a difference?
You could find out by running a sample with and without trimming. Passing the data through a trimming program is not going to add a great computational burden. Trim the data once and be sure that there is no extraneous sequence.
I had compared trimming with the illumina sequence which did not seem to work vs trimming with the illumina sequence+the 5 bases which seems to have trimmed everything based on fastqc. I had done a scatterplot with a pearson correlation for the gene counts and the tpm counts and the pearson correlation was 1. Do you think this is an appropriate way to compare them?
See: Quality filtering prior to pseudoalignment
This addresses trimming as well.