Hi everyone,
This is my first post here, sorry if this issue is out of place but I am really new in bioinformatics and scripting. I am working in de novo genome assemblies with some marine invertebrates, and as part of my pipeline I have error-corrected some FASTQ files using Rcorrector. This software added the following information to the FASTQ headers
In the header line for each read, Rcorrector will append some information.
"cor": some bases of the sequence are corrected "unfixable_error": the errors could not be corrected "l:INT m:INT h:INT": the lowest, median and highest kmer count of the kmers from the read
So, I have some FASTQ files with the following headers:
@HWI-ST169:272:C0RCGACXX:1:1306:13471:25027 1:N:0: l:185516 m:185516 h:185516 unfixable_error
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+
?@@FF<D@6DFDDIGFHFB@B?@BBB6BBBBDBDDD
@HWI-ST169:272:C0RCGACXX:1:2107:18438:124552 1:N:0: l:185516 m:185516 h:185516 unfixable_error
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+
=@;::)<AFD)0?AFFFDB6;?B637:BBBB6BBBB
@HWI-ST169:272:C0RCGACXX:1:1204:15681:165032 1:N:0: l:185516 m:185516 h:185516 unfixable_error
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
This is just a snapshot of the headers but other reads have the information that I mentioned above. So, I am wondering how I can remove the additional information included by Rcorrector. For example, l:185516 m:185516 h:185516 unfixable_error
I want to use Meraculous for de novo genome assembly but I am getting the error that FASTQ header is not valid. I guess this error is related to this additional information in the FASTQ headers.
Hope someone here can help me.
Thanks in advance,
Felipe
Thanks Pierre, your command works so smoothy. Now I can see if Meraculous accepts the new FASTQ headers of my files. Great!!!
you can now validate my answer (green mark on the left) to close this question.