Hi. I have performed MultiQC for 100 paired end reads after filtering using Cutadapt. But the result shows only for the 2nd end read for all the 100. So does the results include both forward and reverse reads?
Even after filtering using Cutadapt, the MultiQC results show many short reads as shown in the image below. So what should I do? Should I redo Cutadapt with different minimum length and quality cutoff set or will it affect the results?
your question is ill-formed, multiqc summarizes other reports, it does not select files to run analyses on, it does not generate original reports, it simply summarizes existing reports
have you performed the processes on each file? can you see the individual reports for each file?
But the result shows only for the 2nd end read for all the 100. So does the results include both forward and reverse reads?
I have a feeling off the top of my head, that if you run Cutadapt in paired-end mode then MultiQC can end up naming your samples after just one of the input FastQs like this. So I suspect that it's fine, but please manually check at least one of the cutadapt log files to sanity check and make sure that it looks ok.
Even after filtering using Cutadapt, the MultiQC results show many short reads as shown in the image below.
Does the image show that? To me I see most of the samples with mostly light-blue bars, which is the category "passed filters". These are the reads that were not filtered out, that successfully passed on into the resulting FastQ files for downstream analysis.
You have one sample with a significant fraction of reads categorised as "Pairs that were too short", which are reads that were lost. This sample likely suffered from significant adapter contamination.
Changing the cutoff will affect the results, but whether it's something you should do or not depends on your biological question, input sample setup and downstream analysis. I doubt that it's something that you need to or want to do.
Thanks for the detailed response Mr. Ewels. I just want to clarify. So that one sample with a significant fraction of reads categorized as "Pairs that were too short" are already removed by Cutadapt and the FASTQ file generated at the end of Cutadapt analysis has only the reads that fall under the "passed filters"? Right?
Probably, but it depends on exactly how you ran Cutadapt. I'd recommend that you read over https://cutadapt.readthedocs.io/en/stable/guide.html#filtering and, most importantly, check the raw log files from cutadapt if you're unsure. MultiQC is only ever for summary statistics and the "ground truth" is always the source log files.
your question is ill-formed, multiqc summarizes other reports, it does not select files to run analyses on, it does not generate original reports, it simply summarizes existing reports
have you performed the processes on each file? can you see the individual reports for each file?