Possible Bug In Seqtk Trimfq
1
1
Entering edit mode
12.5 years ago
Abhi ★ 1.6k

Hi Heng

I think there could be a possible bug in the seqtk trimfq option. here is a contrived example. If I run this read through it I get back the same read. I am using the latest version from github.

@2260:7:1101:14363:3089/1
CCTGCATTCTCACTCGTGTGGGCTCCACGACTGGGTCACCCCGCCGCTTCACTGCCCACACGACGCTCCCCTACCCATCCACACACCCGGCACGAATGCAGGTTCTTGTGTGAATGCCACGGCTTCGGTGGGGGGTTTGAGCCCCGCTCC
+
bbbeeeeegggffihhhhiihihihiiiihiifihhiiihihhhghgggeeeeeddcdcccccc^accaccc^a[^bcc`[bbcccWX]accccacccc`ba[bbccb]`^^bcdccc_b]]TX[bBBBBBBBBBBBBBBBBBBBBBBBB

i think the space in the read is getting messed up but I think you can get the idea.

Expected result : below is a just a made up answer but I would expect the trimming algo to remove low quality bases from both end and beginning. In this this case the last ~24 bases of the read are low quality (ASCII(B) - 64 = 2 ) and should be trimmed.

 @2260:7:1101:14363:3089/1
CCTGCATTCTCACTCGTGTGGGCTCCACGACTGGGTCACCCCGCCGCTTCACTGCCCACACGACGCTCCCCTACCCATCCACACACCCGGCACGAATGCAGGTTCTTGTGTGAATGCCACGGCTTC
+
bbbeeeeegggffihhhhiihihihiiiihiifihhiiihihhhghgggeeeeeddcdcccccc^accaccc^a[^bcc`[bbcccWX]accccacccc`ba[bbccb]`^^bcdccc_b]]TX[b

-A

• 3.6k views
ADD COMMENT
0
Entering edit mode

you should also state of what you expect to see and why. It is not obvious from your example.

ADD REPLY
0
Entering edit mode

edited the question with expected answer. I assumed that it was obvious my bad.

ADD REPLY
2
Entering edit mode
12.5 years ago

I think this is an issue of encoding. In the Sanger encoding the offset is 33 so B stands for a quality of 66 -33 = 33, pretty good.

Previous Illumina encodings used a different ASCII offset (64) in which case B would stand for a quality of 2, bad.

ADD COMMENT
0
Entering edit mode

so trimfq only works for newer Illumina data i.e Sanger like Phred scores ?

ADD REPLY
0
Entering edit mode

well that is the current standard, over the long term is better that way

ADD REPLY
0
Entering edit mode

Is there a way to have seqtk convert the previous encoding (phred+64) to the current Sanger encoding? I tried seqtk seq -Q64 infile.fq > outfile.fq but this returned an unaltered file.

ADD REPLY
0
Entering edit mode

search for some tips on converting fastq, for example: Convert Illumina reads to Sanger score

ADD REPLY
0
Entering edit mode

Thanks for the suggestion. I am aware that various tools can do this but I like to avoid using multiple tools if things can be done with one. The seqtk documentation has an example to convert ILLUMINA 1.3+ FASTQ to FASTA which made me expect that it could do the conversion for me but apparently it cannot do that after all.

ADD REPLY

Login before adding your answer.

Traffic: 2058 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6