Hi,
I'm working with RAD genomic data (produced through genotype-by-sequencing). Two panels in my MultiQC document seem to contradict one another, the "Trimmed Sequence Lengths (3')" and the "Sequence Length Distribution" panels. I expect a lot of duplication because of the GBS protocol.
In the Trimmed Sequence Length panel (multicolored), there are peaks around 70, 90, and 110 bp. When I used Trim Galore (a Cutadapt wrapper), I indicated a quality score (q) of 0, so there should have been no quality trimming, only removal of the adapters. Is this plot indicating that some reads were trimmed >50 bp? Trimming both Illumina adapters should have removed <40 bp.
In the Sequence Length Distribution panel (orange), I think the peaks are more normal- 30, 50, 70 bp.
Can anyone provide any ideas about what happened along the way?
Thanks so much!
If the Sequence length distribution data is post trimming (cutadapt is pre-trimming) and if you expect ~40 bp to be removed then the peaks seems to have shifted correctly in trimmed data?
Thanks for your reply! I was under the impression that the Trimmed Sequence Length plot is showing the number of nucleotides that are trimmed per read, which would imply that it's trimming, in some cases, >100 bp, which doesn't make sense if I'm only asking it to trim ~40 bp total, per read. What do you mean by "cutadapt is pre-trimming"?
I wonder, does the x axis label "Length trimmed (bp)" mean the base pair at which the read was trimmed? Which I think is similar to your idea?
I don't use cutadapt so I am not so familiar with its metrics. I thought that the plot you were showing us in figure 1 was data prior to trimming. Is that not the case?
This must show the plot of reads (length) that remains after trimming. Not the bases that got trimmed.
Frankly, I'm not certain what the "Trimmed Sequence Length" plot is showing. I can't figure out another explanation besides that, as you suggested, this plot shows read lengths before trimming, and the other plot shows lengths after trimming. However, there's a huge peak around 140 in the second plot that I can't explain...