Question

when to use trimmomatic

0

Entering edit mode

19 months ago

Nelo ▴ 20

Hi

Looking at this image, do I have to perform trimmomatic for this multiqc data or not. And if yes, what should be value for parameter like MINLEN..

multiqc

trimmomatic multiqc sra • 2.1k views

ADD COMMENT • link updated 19 months ago by Meisam ▴ 250 • written 19 months ago by Nelo ▴ 20

0

Entering edit mode

@Nelo if you have a good reference you are aligning to then you generally may be able to let aligner soft-clip parts of the reads that do not align. So trimming is not needed in strict sense.

FastQC results need to be taken in context of the type of data. The default limits FastQC uses to come up with these plots are for normal genomic data. Other types of data will invariably lead to "failures" on one or more FastQC category. You don't need to get a pass on all FastQC categories before proceeding with analysis.

ADD REPLY • link 19 months ago by GenoMax 147k

score 2 · Answer 1 · 2023-04-19

2

Entering edit mode

19 months ago

Meisam ▴ 250

Your overall quality seems ok, you have to look at the per base quality score to see if you want to trim the ends of your reads or not. [Trimmomatic has also a nice built in adapters.fa which you can use to remove known Illumina and other widely used adapters which might help you to make your reads a bit cleaner, but since your adapter content is quite low I'd assume you have used it already]. For the "MINLEN" parameter you can go as low as any length that you are confident would still uniquely align to your reference genome. For human it's normally ~25bp, but of course that's minimum, personally I set MINLEN to 30-50 depending on my original reads length.

ADD COMMENT • link 19 months ago by Meisam ▴ 250

0

Entering edit mode

Hey! Thank you for responding so quick.

But my basic question is out of all the fastqc modules, which modules should I focus on while preparing them for trimmomatic.

And for some other biobrojects(talking about others data), some of the sra data are displaying yellow box in the STATUS CHECK. Is it something I should also take into consideration or not ?

ADD REPLY • link 19 months ago by Nelo ▴ 20

0

Entering edit mode

As I mentioned the most important panel is the "per base sequence quality", with many Illumina sequences you'd see lower quality toward the end of the reads with the box plots spanning to the yellow and red segments of the plot, so you might want to trim those lower quality bases from the end of your reads. Another important panel is the "per base sequence content", unless you are sequencing an amplicon with defined backbone, you'd normally expect to see a random distribution of different bases at all positions, meaning that a roughly ~25% of each of the A T C and G bases across all bp positions of the reads. If you see an irregularity in the distribution of the bases and a significant deviation from the equal proportion especially toward the ends, you might want to trim those ends as well.

ADD REPLY • link 19 months ago by Meisam ▴ 250

1

Entering edit mode

If you see an irregularity in the distribution of the bases and a significant deviation from the equal proportion especially toward the ends, you might want to trim those ends as well.

Not necessarily. https://sequencing.qcfail.com/articles/positional-sequence-bias-in-random-primed-libraries/

ADD REPLY • link 19 months ago by GenoMax 147k

0

Entering edit mode

Thank you very much GenoMax I never knew of the source of the irregularity was thinking it’s sort of a regular sequencing error at the start of reads, now that I understood the source it’s quite clear why trimming is not a fix, thanks!

ADD REPLY • link 19 months ago by Meisam ▴ 250