Hi all, After removing rRNA in the fastq files with sortmeRNA, one of the paied reads was corrupetd, which failed to do fastqc with error:
uk.ac.babraham.FastQC.Sequence.SequenceFormatException: ID line didn't start with '@'
at uk.ac.babraham.FastQC.Sequence.FastQFile.readNext(FastQFile.java:158)
at uk.ac.babraham.FastQC.Sequence.FastQFile.next(FastQFile.java:125)
at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:76)
at java.lang.Thread.run(Thread.java:748)
I checked the lines of the two paired reads after sortmeRNA, and found one of the paired reads had two more lines than the other.
`wc -l S3-sortmerna_1.fq S3-sortmerna_2.fq`
**210133674 S3-sortmerna_1.fq**
**210133672 S3-sortmerna_2.fq**
Can someone explain the reason why this happed and give me some advice how to repair the fastq file?
Below are the command lines I used to do the sortmeRNA and fastqc
sortmerna --ref $REF --reads ./S3-interleaved.fq --sam --num_alignments 1 --fastx --align
ed ./S3_rRNA --other ./S3_non_rRNA --log -v --paired_in
unmerge-paired-reads.sh ./S3_non_rRNA.fq ./S3-sortmerna_1.fq ./S3-sortmerna_2.fq
fastqc /S3-sortmerna_1.fq ./S3-sortmerna_2.fq
*Started analysis of S3-sortmerna_1.fq*
*Approx 5% complete for S3-sortmerna_1.fq*
.
.
.
*Approx 95% complete for S3-sortmerna_1.fq*
*Analysis complete for S3-sortmerna_1.fq*
*Started analysis of S3-sortmerna_2.fq*
*Approx 5% complete for S3-sortmerna_2.fq*
*Approx 10% complete for S3-sortmerna_2.fq*
*Approx 15% complete for S3-sortmerna_2.fq*
*Approx 20% complete for S3-sortmerna_2.fq*
*Approx 25% complete for S3-sortmerna_2.fq*
*Approx 30% complete for S3-sortmerna_2.fq*
*Failed to process file S3-sortmerna_2.fq*
*uk.ac.babraham.FastQC.Sequence.SequenceFormatException: ID line didn't start with '@'*
*at uk.ac.babraham.FastQC.Sequence.FastQFile.readNext(FastQFile.java:158)*
*at uk.ac.babraham.FastQC.Sequence.FastQFile.next(FastQFile.java:125)*
*at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:76)*
*at java.lang.Thread.run(Thread.java:748)*
Have you checked to ensure that the original files themselves were not corrupt before you did the sortme-RNA?
You can use
repair.sh
from BBMap Suite to re-pair the files (check this link: C: Calculating number of reads for paired end reads? )Can you explain how your tool will repair the corrupted fastq files? The original files were not corrupted. Some thing went wrong when I do sortmerna and unmerge-paired-reads.sh to get the paired files, they have different number of lines(210133674 S3-sortmerna_1.fq 210133672 S3-sortmerna_2.fq)
repair.sh
compares records in two files and should keep those that have a match in both and remove any singletons to separate files. That said, if your file has corrupt fastq records (i.e. they don't have 4 lines per record and that may be the case here) then repair.sh may not work. You may get an error or it may remove more than 2 reads.If you are sure the original files are fine then perhaps try re-running sortmeRNA again.