gmap error: Length XXXX of quality score differs from length XXX of nucleotides in sequence
1
0
Entering edit mode
8.3 years ago
Hans ▴ 140

Hello I am running gmap with paired ends fq.gz files. after a while I get this error message. Length 4783 of quality score differs from length 539 of nucleotides in sequence ��<���.���*D#P:/���7"A@��]�e.�m��>�rGӕ�3x��L�=�}��+|�'�nb{��5h�GXU�F�Bw/D�n�TN�Cm��

Is there any way to tell gmap to skip this sequence or should I try and find the offending line? Thank you

gmap • 2.4k views
ADD COMMENT
0
Entering edit mode
8.3 years ago
Medhat 9.8k

Do you know which line have this issue ?

If so you can extract this part from fastq file to see how it look like

assuming that the issue in line 4

sed -n '1,4p' yourFile.fastq

then you can extract it by this command

sed -n '1,4d' yourFile.fastq > newFile.fastq

BUT

If you have pair ended read you need to remove it from both files

in case if you do not know the line number you can use the awk code in this post to find it out Fastq Quality Read And Score Length Check

ADD COMMENT
0
Entering edit mode

Thank you medhat for your reply. I do not have the line numbers, I will have to look for them as you suggested. However, it seems strange that the length of the lines is so long . I have looked at one of the fastq files and the lines there are as expected not longer than 120 chr. In the message I get its say: "Length 6925 of quality score differs from length 3471 of nucleotides".

ADD REPLY
0
Entering edit mode

also I think maybe your file is truncated to have another problem cause 3471 is not the normal sequence length

ADD REPLY
0
Entering edit mode

I have mistakenly tried to work with zipped files. Unzipping solved my problem. It's my bad.

ADD REPLY

Login before adding your answer.

Traffic: 2548 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6