Hi folks,
I just got HiSeq results of fungal genomes back from our sequencing facility. After adapter and quality trimming, I ran fastqc and found double peaks as shown in the pictures. Previous threads mentioned a contamination from other species, but in my case it was a pure culture I sequenced. May I have your opinion whether I should just ignore the peaks and continue with assembly? Or there is any extra step I should try to make sure that there is nothing wrong with the data?
Thanks in advance and have a great weekend!
Best Vin
Thanks so much Brian. :) I was thinking about de novo assembly first and compare with a reference genome since I feel like the ref genome needs a lot of improvement. In this case, it's very likely that my stuff is contaminated. Do you think it's better if I map it back to a reference genome, seclude unmapped ones and just play with what's left?
Also I was thinking if the culture is not pure as I thought and the contamination could be from different isolates of the same species. But if that's the case, the GC value should be very similar or even identical?
Best , Vin
Contamination from a different strain would not be obvious on a GC graph; they would completely overlap - contamination from a different strain is virtually impossible to detect. You can assemble, map the reads to the assembly, call variants, and view them in IGV to see if you have two strains in a library.
Please use
ADD COMMENT
to reply to an earlier answer, as such this thread remains logically structured and easy to follow. I have now moved your reaction, but as you can see it's not optimal.If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.