I have a non-redundant FASTA file in following format:
TCACCCATCGTACCCACTTG 1
TTTTTGATCCTTCGATGTCGGC 64
TCTTGAAGTAGAAAAGTTGTGGTT 2
CGTAAGAATGTCCACAGCCAAGC 1
......
the 2nd column is the abundance of the corresponding read. I would like to have a new FASTA file containing all redundant sequences, i.e. the 1st read appears once; the 2nd appears 64 times; the 3rd one appears twice...
Could anyone help?
http://whathaveyoutried.com
Here's a hint using Pierre's comment as input:
$ perl -e 'print "http://whathaveyoutried.com\n\n" x 2'
http://whathaveyoutried.com
http://whathaveyoutried.com
I love the people here.
So do I. In case it's not obvious, you would replace the URL in my code above with the DNA string (in the first column of the example input) and the '2' with the appropriate number (in the second column). Perl will autosplit input from the command line (giving you the values in those columns), so you would only need make a slight modification to add a header.