I have been trying to download SRA data from NCBI and putting it in fastq format using fastq-dump. A colleague and I have been trying to figure out why the resulting fastq files are causing some errors when inputted into prinseq-lite.
My collaborator has been using this fastq-dump command:
fastq-dump --defline-seq '@$sn[_$rn]/$ri' --split-files SRR5040251
This is the resulting fastq file for read /1 (we also have the corresponding read /2 file):
@FCC4LTMACXX:1:1101:2339:1998/1
NGATAATTAGAACTATAACCCCCTTCCTGCTCTATAGATAAGATTTGATAATTCTGACCATATACCAGAACCCCCCATTCCGTATTATTAG
+SRR5040251.1 FCC4LTMACXX:1:1101:2339:1998 length=91
#1=DDDDDEDDDDIIIBE?CF@A)CBE>CBCD*:C@@?9**??*?B*?DD99D?B44*?DC@C###########################@
@FCC4LTMACXX:1:1101:3060:1995/1
NTGCTTCTCAAGGTGGCCATCAAATTGTTAAGTTGTTCCTTGTAAGAGGAAGATACGGTGGCGAAGCCACCACCCTTCTTTCCACGGCCAT
+SRR5040251.2 FCC4LTMACXX:1:1101:3060:1995 length=91
#1=DFFFFHHHHHEHIJJJJJJJJJJJJJJJJIJJGJJJJJJIGGJIJJIJJJHIFDEFHIGIGJHGGFFFFDDCACDDDDDDEDBDDDDC
@FCC4LTMACXX:1:1101:3278:1996/1
NTTATTTGTTCAAACTACTTCTGATTGGAGATTCTGGAGTAGGGAAATCGTGCTTATTGTTGAGATTTGCGGATGATGCTTATTCTGAAAG
+SRR5040251.3 FCC4LTMACXX:1:1101:3278:1996 length=91
#4BDFFFFHHHHHJJJIJJJJJJJJJJJJJIJJJJJJHJFHIJJHJIJJJHIJJIJJJJHIJIIHJJJJJJHHFFEEEEEEEEEEFEDDDC
@FCC4LTMACXX:1:1101:4171:1998/1
NGTCCCCAAACCCCAGATCAAATAGTACCGGACCGTTAAAACACTCTGTAATCATTTTTTGGTATAACTGTGTTTTATTTTGAAGACATGG
+SRR5040251.4 FCC4LTMACXX:1:1101:4171:1998 length=91
#1=DDFFFHHHHHJJJHJJJJJJJIIIIJJJJJIJGIIJJJJJJJJIJJJJIJGIJHHHFFDAEEDDDDDCDACDCDDDEEDDDDDDDDDC
@FCC4LTMACXX:1:1101:5115:1991/1
NGACCACAGACGCTTAGCTCTCCAGAGCCCGGTGAAGTTGAAGAGTCATTGGATGCGCCTTTCGCCATGAGCCAAACAGAATCACCAGCTC
+SRR5040251.5 FCC4LTMACXX:1:1101:5115:1991 length=91
#4=DFFFFHHHHHJJJJJJJJJJJJJJJJJJJDHHIJHIGIIJJICHIJJIJJJIJJHHHFFFFDDDDDDDDDDDCBDDDDDDDDDDDDDB
When using prinseq-lite:
prinseq-lite.pl -fastq SRR5040251_1.fastq -fastq2 SRR5040251_2.fastq -derep 12345
Which produces the following error:
ERROR: input file for -fastq is in UNKNOWN format not in FASTQ format.
We have been searching all day and cannot find a solution to this.
Exactly, it should perfectly work if you fix the problem of the line starting with the symbol "+".