Hi all,
I ran tophat on my fastq file and I got the following error:
[2015-09-02 11:23:58] Beginning TopHat run (v2.1.0)
-----------------------------------------------
[2015-09-02 11:23:58] Checking for Bowtie
Bowtie version: 2.2.5.0
[2015-09-02 11:24:00] Checking for Bowtie index files (genome)..
[2015-09-02 11:24:00] Checking for reference FASTA file
[2015-09-02 11:24:00] Generating SAM header
[2015-09-02 11:24:06] Reading known junctions from GTF file
[2015-09-02 11:24:50] Preparing reads
[FAILED]
Error running 'prep_reads'
Error: beginning of quality values record not found! (@HWI-ST387:212:D1AA6ACXX:4:1102:2633:2167 1:N:0:)
Any suggestions for this? Thanks so much!
Run
grep -A4 @HWI-ST387:212:D1AA6ACXX:4:1102:2633:2167 1:N:0: Input.fastq
and copy what you get here.Thanks for your help. I did the grep as you suggested and got the following:
All the reads of this fastq file are of length 75. I created this file for running rMATS. I followed the awk command from here to extract all reads of length 75: Filtering Fastq Sequences Based On Lengths
I also did the tophat on the original fastq file (before extraction) and that ran fine.
Can you paste a cleaner version of the output. I doubt you would see something like
grep: 1:N:0:: No such file or directory
when you perform grep. Also why we are seeingP_R3R4_filtered75.fastq:
orP_R3R4_filtered75.fastq-
tag in front of every line. You know how fastq format looks like, right?. The awk command solution that you used assumes that a fastq record is distributed over four lines. That may be a problem but this is just my guess. I may not speculate much unless I see a cleaner output. Try:and paste the output again.
Sorry, I forgot to put the search item in quotes. I ran the following command:
I got the following:
The problem is that
@HWI-ST387:212:D1AA6ACXX:4:1102:2633:2167 1:N:0:
read has no quality information. A fastq entry takes 4 lines. The first line contains the header, second lines contains the sequence, third line is usually the + sign, and the fourth line contains quality sequence. The above fastq entry which is throwing error is missing third and the fourth line. If you dont know why it happened probably delete this fastq entry and make sure you delete all such entries.Thanks for the suggestion. I am wondering if there is a way to globally check whether the fastq entries of all the reads are fine and to remove the corrupt entries. I have several fastq files and that would be very useful.
I have posted my response as an answer. Please accept it if it has adressed your problem.