Entering edit mode
2.2 years ago
S
▴
10
I used Trim Galore! on my GBS data and have ~30 overrepresented sequences in the fastqc report. Each sequence starts with the restriction enzyme's cutting site (PstI; TGCA). No sources have been identified and the percentages range from 10-77% for the R1 and R2 outputs. Is this to be expected or something I should be concerned about? Thanks.
UPDATE: I just realized that these percentages are out of 100, not 1, so 0.77 is less than 1%. That makes these results seem more normal.
Are these unrecognized primers?
Since your method used restriction enzymes it is not unusual to see the start of the sequence be the recognition site. Looks like what you would expect based on restriction digestion.