Question

Possible Bug In Seqtk Trimfq

1

Entering edit mode

12.5 years ago

Abhi ★ 1.6k

Hi Heng

I think there could be a possible bug in the seqtk trimfq option. here is a contrived example. If I run this read through it I get back the same read. I am using the latest version from github.

@2260:7:1101:14363:3089/1
CCTGCATTCTCACTCGTGTGGGCTCCACGACTGGGTCACCCCGCCGCTTCACTGCCCACACGACGCTCCCCTACCCATCCACACACCCGGCACGAATGCAGGTTCTTGTGTGAATGCCACGGCTTCGGTGGGGGGTTTGAGCCCCGCTCC
+
bbbeeeeegggffihhhhiihihihiiiihiifihhiiihihhhghgggeeeeeddcdcccccc^accaccc^a[^bcc`[bbcccWX]accccacccc`ba[bbccb]`^^bcdccc_b]]TX[bBBBBBBBBBBBBBBBBBBBBBBBB

i think the space in the read is getting messed up but I think you can get the idea.

Expected result : below is a just a made up answer but I would expect the trimming algo to remove low quality bases from both end and beginning. In this this case the last ~24 bases of the read are low quality (ASCII(B) - 64 = 2 ) and should be trimmed.

 @2260:7:1101:14363:3089/1
CCTGCATTCTCACTCGTGTGGGCTCCACGACTGGGTCACCCCGCCGCTTCACTGCCCACACGACGCTCCCCTACCCATCCACACACCCGGCACGAATGCAGGTTCTTGTGTGAATGCCACGGCTTC
+
bbbeeeeegggffihhhhiihihihiiiihiifihhiiihihhhghgggeeeeeddcdcccccc^accaccc^a[^bcc`[bbcccWX]accccacccc`ba[bbccb]`^^bcdccc_b]]TX[b

-A

• 3.6k views

ADD COMMENT • link updated 12.5 years ago by Istvan Albert 101k • written 12.5 years ago by Abhi ★ 1.6k

0

Entering edit mode

you should also state of what you expect to see and why. It is not obvious from your example.

ADD REPLY • link 12.5 years ago by Istvan Albert 101k

0

Entering edit mode

edited the question with expected answer. I assumed that it was obvious my bad.

ADD REPLY • link 12.5 years ago by Abhi ★ 1.6k

score 2 · Answer 1 · 2012-05-03

2

Entering edit mode

12.5 years ago

Istvan Albert 101k

I think this is an issue of encoding. In the Sanger encoding the offset is 33 so B stands for a quality of 66 -33 = 33, pretty good.

Previous Illumina encodings used a different ASCII offset (64) in which case B would stand for a quality of 2, bad.

ADD COMMENT • link 12.5 years ago by Istvan Albert 101k

0

Entering edit mode

so trimfq only works for newer Illumina data i.e Sanger like Phred scores ?

ADD REPLY • link 12.5 years ago by Abhi ★ 1.6k

0

Entering edit mode

well that is the current standard, over the long term is better that way

ADD REPLY • link 12.5 years ago by Istvan Albert 101k

0

Entering edit mode

Is there a way to have seqtk convert the previous encoding (phred+64) to the current Sanger encoding? I tried seqtk seq -Q64 infile.fq > outfile.fq but this returned an unaltered file.

ADD REPLY • link 11.6 years ago by RvV ▴ 30

0

Entering edit mode

search for some tips on converting fastq, for example: Convert Illumina reads to Sanger score

ADD REPLY • link 11.6 years ago by Istvan Albert 101k

0

Entering edit mode

Thanks for the suggestion. I am aware that various tools can do this but I like to avoid using multiple tools if things can be done with one. The seqtk documentation has an example to convert ILLUMINA 1.3+ FASTQ to FASTA which made me expect that it could do the conversion for me but apparently it cannot do that after all.

ADD REPLY • link 11.6 years ago by RvV ▴ 30