Hi, I have issues with several of my fastq files and I did not the problem until i tried to upload the files onto SRA NCBI. The problem is several of my files have corrupted reads. Sometimes the length of the sequence in that particular read is not the same as quality and other times the sequence gets merged with header of the next read and so on. See below for both kinds of examples.
$> zcat RIL_2_UN_Rep5.fq.gz | grep -C 4 'HS3:229:C12DBACXX:1:2306:16681:128211'
@HS3:229:C12DBACXX:1:2306:15588:128229 1:N:0:
ACTTAGATGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCT
+
FFHHHHHJIJJJJJJIJJFIJJJJJJJJIJJJJJIJJFGGGFHH
@HS3:229:C12DBACXX:1:2306:16681:128211 1:N:0:
ACTTAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATC
+
FFHHHHHJGGIIIGIJIJJFGIJJFJIIJJ@HS3:229:C12DBACXX:2:1101:2723:2202 1:N:0:
ACTTAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATC
$> zcat RIL_7_UN_Rep5.fq.gz | awk 'length($0) > 45'
@HS3:229:C12DBACXX:1:2308:3094:111639 1:N@HS3:229:C12DBACXX:2:1101:2675:2212 1:N:0:
@HS3:229:C12DBACXX:4:2308:2028:396@HS3:229:C12DBACXX:5:1101:2144:2218 1:N:0:
DDFADBFGGIIII@HS3:229:C12DBACXX:6:1101:2492:2152 1:N:0:
Is there a way to remove these reads or trim the reads? How do i deal with these problematic files.
Thanks in advance for your help.
Upendra
Thanks for the suggestion. Its a lot of work but i guess have to do it to check what went wrong...