Hi
Looking at this image, do I have to perform trimmomatic for this multiqc data or not. And if yes, what should be value for parameter like MINLEN..
Hi
Looking at this image, do I have to perform trimmomatic for this multiqc data or not. And if yes, what should be value for parameter like MINLEN..
Your overall quality seems ok, you have to look at the per base quality score to see if you want to trim the ends of your reads or not. [Trimmomatic has also a nice built in adapters.fa
which you can use to remove known Illumina and other widely used adapters which might help you to make your reads a bit cleaner, but since your adapter content is quite low I'd assume you have used it already]. For the "MINLEN" parameter you can go as low as any length that you are confident would still uniquely align to your reference genome. For human it's normally ~25bp, but of course that's minimum, personally I set MINLEN to 30-50 depending on my original reads length.
Hey! Thank you for responding so quick.
But my basic question is out of all the fastqc modules, which modules should I focus on while preparing them for trimmomatic.
And for some other biobrojects(talking about others data), some of the sra data are displaying yellow box in the STATUS CHECK. Is it something I should also take into consideration or not ?
As I mentioned the most important panel is the "per base sequence quality", with many Illumina sequences you'd see lower quality toward the end of the reads with the box plots spanning to the yellow and red segments of the plot, so you might want to trim those lower quality bases from the end of your reads. Another important panel is the "per base sequence content", unless you are sequencing an amplicon with defined backbone, you'd normally expect to see a random distribution of different bases at all positions, meaning that a roughly ~25% of each of the A T C and G bases across all bp positions of the reads. If you see an irregularity in the distribution of the bases and a significant deviation from the equal proportion especially toward the ends, you might want to trim those ends as well.
If you see an irregularity in the distribution of the bases and a significant deviation from the equal proportion especially toward the ends, you might want to trim those ends as well.
Not necessarily. https://sequencing.qcfail.com/articles/positional-sequence-bias-in-random-primed-libraries/
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
@Nelo if you have a good reference you are aligning to then you generally may be able to let aligner soft-clip parts of the reads that do not align. So trimming is not needed in strict sense.
FastQC results need to be taken in context of the type of data. The default limits FastQC uses to come up with these plots are for normal genomic data. Other types of data will invariably lead to "failures" on one or more FastQC category. You don't need to get a
pass
on all FastQC categories before proceeding with analysis.