Hi, I'm new in this area, so thanks a lot for any help in advance.
I have some fastq files, in which in some lines there are additional quotes " " added to the quality score in the beginning and the end sometimes and I want to remove them now.
For example:
@NGSNJ-086:647:GW2112051649th:1:1101:6506:1016 1:N:0:CTGAAGCT+ATAGCCTT
AAACTAAGTCAATTCTAATACGACTCACTATAGGAGCTCAGCCTTCACTGCTTCTTAAAGATGCGCACACAACACTCTTTACGTATGTACCGGCACCACGGTCGGATCCTAGATCGGAAGAGCACACGTCTGAACTCCAGTCACCTGAAG
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@NGSNJ-086:647:GW2112051649th:1:1101:7428:1078 1:N:0:CTGAAGCT+ATAGCCTT
AAACTAAGTCAATTCTAATACGACTCACTATAGGAGCTCAGCCTTCACTGCGACAAAATTGGCCATCTTTCCGACAAACAACATGCCCCACGGCACCACGGTCGGATCCTAGATCGGAAGAGCACACGTCTGAACTCCAGTCACCTGAAG
+
"FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF"
So I want to remove the " " in the last line, is there any efficient way to do this, thanks a lot
How did this come to pass? These should not even be in there.
I did some preprocessing in R and the writeFastq command added them because R prints strings with quotes. But I want to avoid doing the calculations again if the quotes can be removed easily
Which package is that function from? No sane bioinformatician would write a FASTQ with quotes.
It was the writeFastq function from the microseq package. Yes, I didn't expect that either. Also strange that it only occurred in some lines.
Sounds either like a crap package or something else being wrong. I would start over from scratch. The double quote is a valid quality value so simply gsubbing it away may/will break the fastqs as well.
The package author fixed this in Apr 2021: https://github.com/larssnip/microseq/commit/b5f1c824605290c6b60df402b1e5de31e242811e
It looks like OP is using an older version of the package. The fact that this issue even existed speaks to how untested the package was when submitted to CRAN.
Thanks, I must check why such an old version is installed / was not updated
Thanks for the suggestion, I guess I will try
sed -i 's/^"//'
and if that does not work I could do it in R (conditioned that it won't happen again when writing) by checking the length, doing it from scratch would take some days
Do you know a safe writeFastq function?
Use cpad's
0~4
line-step sed instead of your all-line sed.