fastx_collapser seems to convert my fastq files to fasta. That's not cool.
cat a
@HWI-ST132_0395:8:1:1177:1888#ATCTNC/1
ATACATATATCAGCATAAAGGTGTTCACAGGTCATCATGAGGGATCAGTTTGTAGCAATTACGGAGGTCACGAGATCGGACGAGCGGTTGCGCA
+HWI-ST132_0395:8:1:1177:1888#ATCTNC/1
d^`d`dddeccce\eedddac^JW\`X````Z`L``L]\\TYHNVZQ`__L\P_^a_^\^BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
@HWI-ST132_0395:8:1:1048:1897#ATNTNN/1
GTGGATTCCGGGGGAATGGGGAGCGGGACGATGTGAAAGGAGCGGGAAGGGGGCGGAAGCGCGGCACAGTCGGCAGGCAGAGTTGCTAGAACAG
+HWI-ST132_0395:8:1:1048:1897#ATNTNN/1
c`cacTccbcccYbU^YM^\L^\\Z^\P]]YLUJ]VOaQ_U]^aBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
fastx_collapser -i a
>1-1
GTGGATTCCGGGGGAATGGGGAGCGGGACGATGTGAAAGGAGCGGGAAGGGGGCGGAAGCGCGGCACAGTCGGCAGGCAGAGTTGCTAGAACAG
>2-1
ATACATATATCAGCATAAAGGTGTTCACAGGTCATCATGAGGGATCAGTTTGTAGCAATTACGGAGGTCACGAGATCGGACGAGCGGTTGCGCA
Is there an alternative collapser?
What quality score would you want to see in cases with multiple identical sequences? That's probably the hard problem Assaf was trying to avoid by outputting as FASTA.
yes, that's a pain. i just write a quick throw-away script to do it. you could try emailing the author of fastx toolkit and asking. but i'd like to see a solution with awk/sed. :-)
I'd be happy with anything :) Be it a random choice, or the quality scores of the sequence with the highest overall quality...
Why do you need the output to be fastq? I'd be wary of using random (or at least not entirely correct) quality scores in downstream processing... If you're planning on aligning next, I think most aligners with take fasta input (I know bowtie and novoalign do).