Question

Is There A Fastq Alternative To Fastx_Collapser (Outputs Fasta)?

5

Entering edit mode

14.2 years ago

Yannick Wurm ★ 2.5k

fastx_collapser seems to convert my fastq files to fasta. That's not cool.

cat a
@HWI-ST132_0395:8:1:1177:1888#ATCTNC/1
ATACATATATCAGCATAAAGGTGTTCACAGGTCATCATGAGGGATCAGTTTGTAGCAATTACGGAGGTCACGAGATCGGACGAGCGGTTGCGCA
+HWI-ST132_0395:8:1:1177:1888#ATCTNC/1
d^`d`dddeccce\eedddac^JW\`X````Z`L``L]\\TYHNVZQ`__L\P_^a_^\^BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
@HWI-ST132_0395:8:1:1048:1897#ATNTNN/1
GTGGATTCCGGGGGAATGGGGAGCGGGACGATGTGAAAGGAGCGGGAAGGGGGCGGAAGCGCGGCACAGTCGGCAGGCAGAGTTGCTAGAACAG
+HWI-ST132_0395:8:1:1048:1897#ATNTNN/1
c`cacTccbcccYbU^YM^\L^\\Z^\P]]YLUJ]VOaQ_U]^aBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB


fastx_collapser -i a
>1-1
GTGGATTCCGGGGGAATGGGGAGCGGGACGATGTGAAAGGAGCGGGAAGGGGGCGGAAGCGCGGCACAGTCGGCAGGCAGAGTTGCTAGAACAG
>2-1
ATACATATATCAGCATAAAGGTGTTCACAGGTCATCATGAGGGATCAGTTTGTAGCAATTACGGAGGTCACGAGATCGGACGAGCGGTTGCGCA

Is there an alternative collapser?

next-gen sequencing short • 8.5k views

ADD COMMENT • link updated 14.2 years ago by brentp 24k • written 14.2 years ago by Yannick Wurm ★ 2.5k

1

Entering edit mode

What quality score would you want to see in cases with multiple identical sequences? That's probably the hard problem Assaf was trying to avoid by outputting as FASTA.

ADD REPLY • link 14.2 years ago by Brad Chapman 9.7k

0

Entering edit mode

yes, that's a pain. i just write a quick throw-away script to do it. you could try emailing the author of fastx toolkit and asking. but i'd like to see a solution with awk/sed. :-)

ADD REPLY • link 14.2 years ago by brentp 24k

0

Entering edit mode

I'd be happy with anything :) Be it a random choice, or the quality scores of the sequence with the highest overall quality...

ADD REPLY • link 14.2 years ago by Yannick Wurm ★ 2.5k

0

Entering edit mode

Why do you need the output to be fastq? I'd be wary of using random (or at least not entirely correct) quality scores in downstream processing... If you're planning on aligning next, I think most aligners with take fasta input (I know bowtie and novoalign do).

ADD REPLY • link 13.5 years ago by Weronika ▴ 300

Ram · Answer 1 · 2010-10-07

6

Entering edit mode

14.2 years ago

brentp 24k

Since this doesn't have an answer yet. check reads-utils

which when run as:

./fastq filter --adjust 64 --unique /path/to/your.fasta > unique.fasta

will keep the records with the highest average quality.

ADD COMMENT • link updated 5.3 years ago by Ram 44k • written 14.2 years ago by brentp 24k

1

Entering edit mode

This tool is great and real quick. I've looked at the code but couldn't find a good way to get also the name of the read with the highest average quality to be printed in the output (my C knowledge is fairly rusty). What I'm trying to do is to is to unique paired end reads so I need to know where one read ends and another starts to be able to separate them and use them for downstream analyses. Any ideas ? Thanks.

ADD REPLY • link 12.4 years ago by Liz Fernandez ▴ 70

0

Entering edit mode

Win - its very fast too! I wish more things were written in C! thanks

ADD REPLY • link 14.2 years ago by Yannick Wurm ★ 2.5k

0

Entering edit mode

(although for "unique" one adjust should be unnecessary and has no effect) :)

ADD REPLY • link 14.2 years ago by Yannick Wurm ★ 2.5k

0

Entering edit mode

I guess it cannot collapse paired-end reads, can it?

ADD REPLY • link 11.2 years ago by Biomonika (Noolean) 3.2k