I was able to succeessfully remove adapters from my PE 150bp x 2 reads using bbduk.sh, but I kept seeing that I had a long string of Gs in my R2 sequences (~ 0.1% of my R2 reads).
I reran the original fastq files with bbduk to remove the string of Gs, and this worked for most of my PE files except for one pair.
When I ran this trimmed set of PE reads through FastQC (after bbduk), I received an error that said:
Failed to process file EA_Pool-POW_1-1a_S28_L001_R1_CLEANEST.fastq
uk.ac.babraham.FastQC.Sequence.SequenceFormatException: Ran out of data in the middle of a fastq entry. Your file is probably truncated
at uk.ac.babraham.FastQC.Sequence.FastQFile.readNext(FastQFile.java:179)
at uk.ac.babraham.FastQC.Sequence.FastQFile.next(FastQFile.java:125)
at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:77)
at java.lang.Thread.run(Thread.java:722)
Has anyone encountered this issue before with output fastqs from bbduk?
Should I just not worry about the ~0.1% of GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG kmers in my R2 reads?
Thank you!
That was my thought, and it's such a low % of the R2 reads. Thanks for the input!
If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one answer if they work. This will help future users that might find this post find the right answer.