I have found the a number of reads I want to test against a genome using Bowtie. They are located here: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM113418
The problem is that the data is in the format shown below:
> ID_REF = SEQUENCE
> VALUE = NUMBER OF READS
> ID_REF VALUE
> AGGCAGTGTAGTTAGCTGATTGC 197
> TCCCTGGTCTAGTGGTTAGGATTCGGC 177
> TCACAACAACTGTGTGGAGGTATAGGTGT 149
> TATTTATTGAGGGCCTACTATGTGCCGGG 125
While Bowtie wants reads in this format:
> @r0/2 GAATACTGGCGGATTACCGGGGAAGCTGGAGC
>+EDCCCBAAAA@@@@?>===<;;9:99987776
>@r1/2 AATGTGAAAACGCCATCGATGGAACAGGCAAT
>+EDCCCBAAAA@@@@?>===<;;9:99987776
>@r2/2 AACGCGCGTTATCGTGCCGGTCCATTACGCGG
>+EDCCCBAAAA@@@@?>===<;;9:99987776
Is there a standard way for converting the first format into the second? Or are you supposed to process them in some other way? Thanks.
Edited for readability. Note how your data format was not displayed correctly in the original post; indenting lines with 4 spaces was required.