Hi friends and colleagues
I have run 78 wgs metagenomic sequences (submitted sequences from other studies in SRA) with fastqc followed their result accumulation by multiqc. I have got some result like below. I am confused whether to trim bases with phred qualty score 20. Because, in that case a lot of sequence will become much shorter. And in that case, it will negatively affect the maping with bowtie aligner. Is my understanding true? What should I do in this case?
Thanks and regards, DC7
Filter the low quality reads, i.e. if the average quality of the whole read is lower than a certain threshold, then discard it. I would only trim low quality of phred < 3, i.e. the everything the sequencer labels as garbage and then do a length filtering. The more bases you trim off, the more likely it becomes that a read is ambiguous, especially with repetitive or closely related reference sequences.
DC7 : You should clarify what is the intended downstream use for this data, in the original question. Based on your past threads it appears that you are doing
MetaPhlAn
analysis with these data?As indicated by answers here, many aligners will soft clip parts of reads that don't align but if you are doing any assembly work then you would need to take care of trimming bad quality data yourself.
Thanks genomax for your response. Yes, I will be analysing the sequences with MetaPhlAn to profile the sequences followed by alpha-, beta- diversity analysis, etc. There is no assembly work as such. MetaPhlAn uses bowtie2 aligner which discards reads shorter than 70bp (default) and also discards reads that map with MAPQ < 30. Considering these information, do you think I should go for a separate clipping step?