when to use trimmomatic
1
0
Entering edit mode
19 months ago
Nelo ▴ 20

Hi

Looking at this image, do I have to perform trimmomatic for this multiqc data or not. And if yes, what should be value for parameter like MINLEN..

multiqc

m2

trimmomatic multiqc sra • 2.1k views
ADD COMMENT
0
Entering edit mode

@Nelo if you have a good reference you are aligning to then you generally may be able to let aligner soft-clip parts of the reads that do not align. So trimming is not needed in strict sense.

FastQC results need to be taken in context of the type of data. The default limits FastQC uses to come up with these plots are for normal genomic data. Other types of data will invariably lead to "failures" on one or more FastQC category. You don't need to get a pass on all FastQC categories before proceeding with analysis.

ADD REPLY
2
Entering edit mode
19 months ago
Meisam ▴ 250

Your overall quality seems ok, you have to look at the per base quality score to see if you want to trim the ends of your reads or not. [Trimmomatic has also a nice built in adapters.fa which you can use to remove known Illumina and other widely used adapters which might help you to make your reads a bit cleaner, but since your adapter content is quite low I'd assume you have used it already]. For the "MINLEN" parameter you can go as low as any length that you are confident would still uniquely align to your reference genome. For human it's normally ~25bp, but of course that's minimum, personally I set MINLEN to 30-50 depending on my original reads length.

ADD COMMENT
0
Entering edit mode

Hey! Thank you for responding so quick.

But my basic question is out of all the fastqc modules, which modules should I focus on while preparing them for trimmomatic.

And for some other biobrojects(talking about others data), some of the sra data are displaying yellow box in the STATUS CHECK. Is it something I should also take into consideration or not ?

ADD REPLY
0
Entering edit mode

As I mentioned the most important panel is the "per base sequence quality", with many Illumina sequences you'd see lower quality toward the end of the reads with the box plots spanning to the yellow and red segments of the plot, so you might want to trim those lower quality bases from the end of your reads. Another important panel is the "per base sequence content", unless you are sequencing an amplicon with defined backbone, you'd normally expect to see a random distribution of different bases at all positions, meaning that a roughly ~25% of each of the A T C and G bases across all bp positions of the reads. If you see an irregularity in the distribution of the bases and a significant deviation from the equal proportion especially toward the ends, you might want to trim those ends as well.

ADD REPLY
1
Entering edit mode

If you see an irregularity in the distribution of the bases and a significant deviation from the equal proportion especially toward the ends, you might want to trim those ends as well.

Not necessarily. https://sequencing.qcfail.com/articles/positional-sequence-bias-in-random-primed-libraries/

ADD REPLY
0
Entering edit mode

Thank you very much GenoMax I never knew of the source of the irregularity was thinking it’s sort of a regular sequencing error at the start of reads, now that I understood the source it’s quite clear why trimming is not a fix, thanks!

ADD REPLY

Login before adding your answer.

Traffic: 1835 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6